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:|l I I I I I I I I I I I I I I llh lllllll 1= II IIHII 

SPQEIEALVKHLLMSTAKAHVNKETTMTSPRQQGAGIIDTAAAISTGLYLTG-EDGYGSITLGNVEDTFSFI^ 

610 620 630 640 650 660 670 

2400 2430 . 2460 ■ 2490 2520 2550 2580 2610 

KVAKDLHYTTYrOTDQVKDGFVTIlAPQQLGTFTGK^IRIEPGQTQTITIDIIWSKYHD^IDKKVMPN^ 

I hl = l III: I ::::::: |:||::| | : : | :| I I I : I I I : I I I { I 

NEDKTLNYSTQLTTDTAQKRIDHLGSTSISRDSWRKVrVKaNSSTTWINVI^ 

690 700 710 720 730 740 750 

2640 2670 2700 2730 2754 2784 2814 2844 

DGGEVLSIPYVGFKGEFQNLEVLEKSIYKLVANKEKGFYFQP--KQTNEVPGSEDYTALMTTSSEPIYSTDGTSPIQLKA 
I |: = :|||lll|:||llll llh II hh = lllhl III I II hi hi lllll I =1 
DDGDIVSIPYVGFRGEFQNLAVLEEPIYNLIADGKGSFYFEPXTAQPNTVDISHHYTGLVTGSTELIYSTDKRSDSAIKT 
770 780 790 800 810 820 830 

2874 2904 2934 2964 2994 3024 3054 3084 

LGSYKSIDGKWlLQIJOQKGQPHIiAISmiDQNQDAVAVKGVFIiimEmiLRftKVYRflDDVl^ 

lh:h I ::|:|h hllllllll I llh: llllllh :| I II III III I Ihllll III 

MTFKinCaGYFVIiElDESGKPHIMSENGDITODSriWKGVFLRNYTDLV^ 

850 860 870 880 890 900 910 

3114 3144 3174 3204 3234 3264 3294 3324 

NTENPKSTFLYDTEWKGTTTDGIPLEDGKYKYVLTYYSDVPGSKPQQ^WFDITLDRQAPTLTTATYDKDRRIFKARPAVE 
I :||lh :| III II HI I lllhlllll I llh I hlh :|h:| :|llllh I llhl 

NPKNPKSSIIYPTEWNGTDSDGNALADGKYQYVLTYSSKVPGAAVQTMIFDVIIDRESPVITTATYDETNFTFNPRPAIE 
930 940 950 960 970 980 990 

3354 3384 3414 3444 3474 3504 3534 3564 

HGESGIFREQWYLKKDKDGHYNSVLRQQGEDGILVEDNKVFIKQEKI)GSFILPKEVm)FSHVYYTVEDYAGNLVSAKLE 
illl-milll I I =: : I llllh I nil II 11 lllllllllh 1 = 1 

KGESGLYREQVFYLVADASG-VTTIPSLLKNGDVWSDNKVFVAQfTODGSFTLPLDLADISKFYYTVEDYAGNISYEKVE 
1010 1020 1030 1040 1050 1060 1070 

3594 3624 3654 3684 3711 3741 3771 3801 

DLlNIGNKNGLVNVKVFSPEUISNVDIDFSYSVKDDKGNIlKK-QHHGKDUrtiKLPFGTYTFDL 
:|hll|: III I :: : II I I lllll hlh: : | : : | | | | | | | | | | | | | | | | | ::| 
. NLISIGNEKGLTOOIILDKDTNSPVPILFSYSVTDETGKIVMLPRYAG 

1080 1090 1100 1110 1120 1130 1140 IISO 

3831 3861 3891 3921 3951 3981 4011 4041 

VTVTISEKDSiaCDVLFKVNIiKKRRLLVEFDKLLPRGATVQLVTKTOT^ 

III I :| :| I I I II Ih: I III hhllll : II I II lllll Ull I : III I 

AVVTIIiEDNSTREVNFYWLKDKAISmLIDlDMJ^SGSTIQLVraDGQRlQLPNAKYSK^^ 

1160 1170 1180 1190 1200 1210 1220 1230 

4071 4101 4131 4161 4191 4221 4251 4281 

YSTLENmDLLVSVKEDQVNLTKLTLlNKAPLINaiAEQTDIITQPVFYNAGTHBKl^^ 

I II II hi :| h lllllll I :|| : :|ll hi III . = h I H 
YEFLEELD---VRVLaNQSNVKKLTLINK7aiiKEIiIftEIiAGLEETARYYNaSPEL^ 

1240 1250 1260 1270 1280 1290 1300 

4311 4341 4371 4401 4431 4461 4491 4521 

NAIAALRESRQAIJjJGKETDTSLIiAKAII^TEIKGOTQFVKASPLSQSTYINQVQIAKKLLQKPim'QSEVDKALEm^ 
= h|:| =h llh II I : I = h : 11 1 1 h j: =1 = ||lh l-ll =1 

SAIiASLVAAREQLNGQATDKEia.IAEVSrnfTPTQANFIYYHAENTKQIAYDTATOSAQLVtNQENVTQAVVNQAI^ 
1320 1330 1340 1350 1360 1370 1380 

4551 4581 4611 4641 4671 4701 4731 4761 

AKNQIJiGHETDYSGLHHMIIKANVLKQTSSKYQNASQFAKENYIOTjIKKAELLLSmQATQAQVEELIjNQ 

II hh:|l I I ■■ -III I --W llh h h :: h =1 = hll 1== I = ^ = MM 
AK2USn:j3GQKTDISALRSAVSVSSVLKATDAKYIJSlftSEl!TOCQAYDQAVEAAK^ 

1400 1410 1420 1430 1440 1450 1460 

4779 4809 4839 4845 4860 4890 4920 4950 

RDRVSSAENYSQSIiNDMDSIim'PIN PP NQPQALIFKKBMTKESEVAQKRVLGVTSQTDNQKV 

I : I :: Ih II I : | : | : : : : | : : | h 

mTSlWAKEPAOTATDKKDEGTVTPPPIDSEIVDVQAPPVKDTGNSEHVPlGQKPNPQPT-LPRPVT^ 
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1480 ■ 1490 1500 1510 1520 1530 1540 

4980 5010 5040 5070 5100 5130 5160 5190 

KIOTLPK3GESTPKITVTILLPSLSMi:GIjaTIKLKSIKRE*NTLK]SnUiRHQr^ 
: :1| III: | | :: : || : :: | | 
QVTQLENTGBNDTK- -YVLVPSVIIGLGTLLVSIRRHKEEV 
1560 1570 1580 

A related GBS nucleic acid sequence <SEQ ID 10965> which encodes amino acid sequence <SEQ ID 
10966> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6297> which encodes the amino acid 
sequence <SEQ ID 6298>. Analysis of this protein sequence reveals the following: 

LPXTG motif: 1614-1619 

Possible site: 33 
»> Seems to have a cleavable N- 



20 Final Results 

bacterial membrane Certaintyi=0. 2784 (Affirmative) < succs 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certalnty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AaG09771 GB:AF243528 cell envelope proteinase [Streptococcus thermqphilus] 
Identities = 465/1125 (41%) , Positives = 668/1125 (59%) , Gaps = 61/1125 (5%) 

VEKKQRFSIiRKYKSGTFSVLIGSVFLVM-TTTVaftDEriSTMSEPTiraHAQQQAQHL 59 
++KK+ FSIiRKyK GT SVL+G+VFL +V2iADEL+++ E + T 
MKKKETFSLRKyKIGTVSVLIfiaVFLFi«3aPSV2WU3ELTSLVETKVEA T 49 

ELSSftESKS(3DTSQITLKmffiKEQSQDLVSEPTTTEljftDTDaASMaNTGSI^^ 119 

+ S+S S + E+ D E T+T++ TD GS+A + SA 

VPnAIVSESASESPW EELVDTSVEATSTDVTTTDNEB-ETPGSBALENSA- - 99 

PPVlTrDVHDWVKTKGaTOKGYEG<X3KVVRVIDTOIDPftHQSriRISDySTAKVKSKEDMIA 179 

NT+V T+ A + + KV + + ++D +TA +E 
---NTEVET---TQPAVETPAISEKKV EEEEKLSVMETTAITNQEE 140 



SHGMHVT I GN 



Query: 


1 


Sbjct: 


1 


Query: 


60 


Sbjct: 


50 




120 


Sbjct: 


100 


Query: 


180 


Sbjct: 


141 




239 


SDjct: 


200 




299 


Sbjct: 


258 




359 


Sbjot: 


318 




419 


Sbjot: 






479 


Sbjct: 


435 



G+APEAQVMFMRVF++ + +L++KAIEDAV LGAD INLSLG ANG+ ++ 4 



AIE A++AGVSW+AAGN+ +GS H +P A PDYGLVGh-PST 



D+ GK+ALI+R T+ E lA A GA+GV+IFN++PG++N SM+L 
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Query: 539 FISHEFGKAMSQLNGNGTCSLEFDSWSKflPSQKGNElfflHFS>mGLTSDC3YLKPDITAPG 598 

FI EFG+A++ + + F++ P+ + ++ FS+WGL++DG LKPD+ APG 

Sbjct: 494 FIPLEFGEftIJ«.--.--NSYKIAEi™TDIRPNPEflGLLSDFSSWGLSM3GELKPDLflAPG 549 

Query: 599 GDIYSTmJinryGSQTGTSrMPQIAGASLLVKQYLEKTQEKn^iPKE^ 658 

G IY+ NDN Y + GTSMASP +AGA++LVKQYI1 T P ++I +VK+LLMS A 
Sbjct: 550 GAIYAAINDMDYANMQGTSMASPHVAGAAVLVKQYIiIATYPTKSPQEIEaLVKm 609 

Query: 659 QIHVMPETKTTTSPRCXSGRGLl^iaSAVTSGLYVTGKDNYGSISL 718 

+ HVN ET TSPRQQGAG+++ A+++GLY+TG+D YGSI4-LGN+ DT +F VT+HN 
Sbjct: 610 KAHVNKETTAYTSPRQQGAGIIDTAAAISTGLYLTGEDGYGSITLGNVEDTFSFTVTLHN 669 

Query; 719 LSNKDKTLRYDTELLTDHVDPQKGRFTIiTSHSLKTYQGGEVTVPMGKVTVRVTMDVSQF 778 

++N+DKTL Y T+L TD + TS S + + + +VTV AN TV + +D S F 

Sbjct: 670 ITNEDKTLNYSTQLTTDTAQKRIDHLGSTSISRDSV/R- -KVTVKRNSSTTVTINVDASSF 727 

Query: 779 TKELTKQMPlSKSYYLEGFTOFRDSQDDQtNRVNIPFVGFKGQFENIAVAEESIYRLKSQ^ 838 

+ELT M NGYYLESFVRF D DD + V+IP+VGF+6+P+NLAV EE lY L + GK 
Sbjct: 728 AEELTGLMKNGYYIiEGFVRFTDimjDG-DIVSIPYVGFRGEFQNIAVLEEPIYISn:.!^ 786 

Query: 839 TGFYFDE-SGPKDDIYTOKHFTGLVTLGSETOTSTKTISDNGIjaTLGTFKNRDGKFILEK 897 

GFYF+ 4 + + + H+TGLVT +E ST SD+ + TLGTFKN G F+LE 
Sbjct: 787 GGEYFEPVTAQP^mroISHHYTGLVTGSTELIYSTDKRSDSAIIC^LGTFKNKAGyFVLEL 845 

Query: 898 MAQGNPVIiAISENGDNNQDFAAFKiGVFLRKYQGLKaSVYHASDKEHKNPLWVS-PESFK^ 956 
+ G P LAISPNGD+NQD FKGVFIiR Y L ASVY A D E NPLW S P+S G 

;--G 904 



Query: 1016 LDRQKPVIlSQATFDPET^roFKPEPLKDRGI:JAGVRKDSVFYLERKIHIKPYTVTI^roSYK^ 1075 

+DR+ PV++ AT+D F P P ++G +G+ ++ VFYL + T+ V 

Sbjct: 965 IDRESPVITTATYDETNFTFNPRPAIEKBESGLYREQVPYLVaifflSGVTTIPSLLKNGDV 1024 

Query: 1076 SVErOTC^FVERQADGSFILPLDKaKLGDPYY^lVEDFAG^IVaIAKL 1120 

+V tm. FV + DGSF LPIiD A + FYY VED+AGN++ K+ 
Sbjct: 1025 TVSDNKVFVAaNDDGSFTLPLDLaDISKFYYTVEDYAGNISYEKV 1069 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 543/1676 (32%) , Positives = 821/1676 (48%) , Gaps = 158/1676 (9%) 
Query: 24 
Sbjct: 4 

' Query: '9 ANETPTNNTSSALASTAQD HLVTKANNSPTETQPVAESHSQATETPSPVRNQPVE 133 

+ E+ + +TS T ++ +LV++ + A + ++ A+ P 

Sbjct: 63 SAESKSQDTSQITLCTmEKEQSQDLVSEPTTTELADTDAASMANTGSDATQKSASLPPV 122 

Query: 134 STQEVSKTPLTKQ--NLAVKSTPAISKETPQNID-SNKIITVPKVWNTGYKGEGTWAI- 189 

+T +V TK + K + ID +++ + + V K + ++A 

Sbjct: 123 NT-IOTDWVKTKBAWDKGYEGQGKVVAVIDTGIDPAHQSMRISDVSTAKVKSKEDMLARQ 181 

Query: 190 IDSGLDIN HDALQUUDSTKAK YQNEQQMNAAKAKAGINYGKW 231 

1+ G IN H+ ++ +D+ K ++N + A+ KA I K 

Sbjct: 182 KAAGINYGSWIMDKOTFAHNYVENSDOTKENQFEDFDEDWENPEPDAEAEPKA-IKKHKI 240 

Query: 232 YN inCVIFGHNYVDVHIiaaCEVKSTSHGMBIVTSIATAOT 277 

Y + G + +D + K SHGMHVT I N + T E 

Sbjct: 241 YRPQSTQAPKEWIKTEETDGSHDIDWTQTDDDTKYESHGMHVTGIVAGNSKEAAATGER 300 

Query: 278 lYGVAPEAQVMFMlVFSDEKEGTGPALYVKAIEDAVKLGADSiraiSLGGAISIGSLVNaDDR 337 

G+APEAQVMFMRVF+++ G+ •HL+tKAIEDAV LGAD INLSLG ANG+ ++ 
Sbjct: 301 FIXSlAPEAQVMEWKVFANDIMGSAESLFIKAIEDAVALOaiJVIMLSLGTANGAQIiSGSKP 360 
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Query: 338 LIKALEMaRLAGVSWIAaGNDGTFGSGASKPSALYPDYGi:,VGSPSTAREMSVASYMNT 397 

L+H-A+E A+ AGVSW+AAGN+ +GS P A PDYGLVGSPST R SVA+ N+ 
Sbjcf: 361 IMEaiEKAKKAGVSVVWWiGNERVYGSDHDDPLATNPDYGLVGSPSTGRTPTSVaAINSK 420 

Query: 398 Tj:iVNKVENIIGLENNRNIjmGIiaAYA---DPKVSDKTPE^ 454 

++ ++ + LEN +LN+G AY+ DK + K + + +V+DY + 
Sbjct: 421 VOTQRMTVKELENRADIJiraGKAIYSESVDFKDIKDSLGYDKSHQPAYVKESTDM 480 

Query: 455 TMSKlMIERG-DITFTKKVVMAINHGRViSailBT^^ 513 

+ GKIALIER + T+ + + A HGA+G +IElilNK G++N +M L IP+ F 

Sbjct: 481 DVKGKIALIERDPNKTYDEMIALAKKHGALGVLIFNWKPGQSNREMRLTANGMGIPSAFI 540 

Query: 514 QKEFGDVLAKNNYK- 
EFG +++ N 

Sbjct: 541 

Query: 570 IYAAIKDlffiYDi™SGTSMASPHWAGATM:WQYLLKEHPELKKGDIERTVKYI« 529 

iy+ NDN Y +GTSMASP +AGA+ LVKQYL K P L K 1 VK IMS A+ 
Sbjct: 601 IySTYMam(KQTGTSMRSPQIAGRSLLVKQYLEKTQE^II^'KEKIADIVK^^ 660 

Query: 630 HLNKDTGAYTSPRQQGAGIIDVAAAVQTCLYLTGGENNYGSVTI^NIKDKISE^^ 689 

H+N +T TSPRQQGaG++++ AV +GI.Y+TG ++NYGS++L6NI D 4-+FDVTVHN+ 
Sbjct: 661 HTOPETKTTTSPRQQGAGLl^IIXaVTSGLYVTG-KDNYGSISLGNITDTMi^^ 719 

Query: 690 NKVaKDLIm'TYIi^rIDQV--KDGFVTLRPQQL6TFTGKTIRIEPGQTQTITI 747 

+ KLYTLTDV+GTL LT+G++ T+ + +DVS++ 

Sbjct: 720 SNKDKTIfiYDTELLTDHVDPQKBRPTLTSHSLKTYQGGEVaVPANGKVTVRVT^ 779 

Query: 748 DMLKKVMENGYFLEGYVRFTDPTOGG-EVIiSIPYVGFKBEFQNLEVLEKSI^ 806 

L K MENGY+LEG+VRF D D ++IP+VGFKB+F+NI. V E+SIY+L + + 

Sbjct: 780 KELTKQMPNGYYLEGFTOFRDSQDDQiaro.VNIPFVGPEGQFENLAVAEESIYRLKSQGKT 839 

Query: 807 GFYFQPK-QTNEVPGSEDYTALMTTSSEPIYSTDGTSPIQIiKALGSYKSIDGKmiMIiDQ 865 

GFYF +++ + +T L+T SE ST S L LG++K+ DGK+IL+ + 

Sbjct: 840 GFYFDESGPKDDIYVGKHFTCLVTIBSETNVSTKTISDNGmTLGTFKNADGKFILEKNA 899 

Sbjct: 

Query: 925 YYSGKTENPKSTFLYDTEWKGTTTDGIPLEDGKYKYVLTYYSDVPGSKPQQMVFDITLDR 984 

+ S+ KSTLT + Gl-GLDGY YV++YY DV G+K Ql-M FD+ IiDR 
Sbjct: 360 FNS-DIRFAKSTTLLGTAFSGKSLTGAELPDGHYHYWSYYPDWGAKRQEMTFDMILDR 1018 

Query: 985 QAPTLTTATyDKDRRIFKAEPA-vffiHGESGIFREQVFYLKKDKDGHYNSVLRQQGEDGILV 1044 

Q P L+ AT+D + FK P + G +G+ -+ VFYL++ KD +V , + V 

Sbjct: 1019 QKPVLSQATFDPETtKFKPEPLKDRGUffiVRKDSVFYIiER-KDIKPyTVTIim^ 1077 

Query: 1045 EimVFIKQEKDQSPILPKEViroFSHVYYTVBDYAGKnijVSAKIiEDLINIGlSrKNGLVl^^ 1104 

EEMK F++++ DGSFILP + YY VED+AGN+ AKL D + + +K+ 

Sbjct: 1078 EDNKTFVERQADtMFILPIJ5KAKrX3DFYYMVEDPAGM\aiAKMDin,PQTLGCT 1137 

Query: 1105 FSPEMfSHVDIDFSYSVKDDKGNIIKKQ HHGKDpSILLKLPFGTYTFDI.PLYDEE 1158 

+ + + + ++Q H + 4L DF+E 
Etojct: 1138 TDGNYQTKETLKDNLEMTQSDTGLVTKQAQLAWHRNQPQSQLT KMNQDFFISPNE 1193 

Query: 1159 RAKLISPKSVTVTISEKDSLKDVLFKVNLLKKftALLVEFDKLLP KGATVQLVTKT 1213 

N K K+++ + L VN+ K + K P GA+V + T 

Sbjct: 1194 DOS KDFVAFKGLKNNVYNDL-TVKVYAKD DHQKQTPIWSSQAGASVSAIEST 1244 

Query: 1214 lSmAroLPKATYSPTDYGKNIPVGDYEIJm'LPSGYSai.Eraa3DLLVSVKEDQVNLT--K^ 1271 

A Y T G + GDY+ VT + E+ +SV + + +T + 

Sbjct: 1245 - AWY6ITARGSKOT1PG0YQYVVTYRDEHGK-EHQKQYTISVNDKKPMITQGRF 129S 

Query: 1272 TLINK APLI^IaLREQTDIITQPVPYNAGTHLKNNYLANLEKAQTrlIKNRVEQTSID 1327 

IN P + + I++VFYA KN + TD 
Sbjct: 1296 DTINGVDHFTPDKTKALDSSGIVREKVFYI1A---KKNGRKPDVTEGKDGI -TVSD 1346 
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Query: 


1328 


Sbjct: 


1347 


Query: 


1380 


Sbjct: 


1407 


Query: 


1437 


Sbj at : 






1486 


Sbjct: 


1527 




1532 


Sbjct: 


1586 



N +NT P N 



SEQ ID 8964 (GBS92) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 31 (lane 2; MW 48kDa). 

GBS92-His was purified as shown in Figure 199, lane 9 . 

Based on this analysis, it was predicted ttiat these proteins and their epitopes could be useful antigens for 
vaccines or " 



Example 2038 

A DNA sequence (GBSx2149) was identified in S.agalactiae <SEQ ID 6299> which encodes the amino 
acid sequence <SEQ ID 6300>. This protein is predicted to be AzlC family protein. Analysis of this protein 
30 sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-termlnal signal sequence 

Likelihood = -7.80 Transmembrane 212 - 228 ( 196 - 230! 

7 - 183 ( 159 - 

9 - 205 ( 188 - 

7 - 33 ( 13 - 

5 - 151 { 135 - 15i; 

1 - 77 ( 60 - 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = -5. 
Likelihood = -2 
Likelihood = -1 
Likelihood = -1 



Final Results 

bacterial membrane Certainty=0.4121(AffirTnative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10235> which encodes amino acid sequence <SEQ ID 
10236> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF10212 GB:AE001921 AzlC family protein [Deinococcus radiodurans] 
Identities = 72/224 (32%) , Positives = 117/224 (52%) , Gaps = 8/224 (3%) 

Query: 6 FKEGVKDALPTALGyiSIGIAFGrVASASDLSAIEVGLMSALVYGGSAQPAMCALLLAKA 65 

F +G + +P LG + LA+ - A A+ LS + LMS + G++QFA L A A 
Sbjct: 7 FWQGFRALVPLWLGTVPERLAYAVTARAAC-LSVGDTCLMSLTTFAGASQFAAAGLFGAHA 66 



55 Query: 66 DLMTITMTVFLVrai,RNMLMSLHATTIFKSaHLMNQLAIGTLITDESYGV-LLGEALHHKV 124 

++I +T FL+N R++L L + L ++ +TDE+YGV ++ A 
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Sbjct: 67 GGLSIVLTTFLLNARHLLYGLSLftREIiRLT-IiPQRVVARQFLTDKAYGVAWSGARLPGG 125 

Query: 125 VSPSWTfflGNlSrWSmTWISTIIGTIirXMTIHIPEMFGIiDEMiVAMFI^^ 134 

++ +++ G + YL+W +ST++G Ii GS- +P PE G+ F+GL V ++ 
Sbjct: 126 LTEaPLLGAELSLYLSWOTSTIiGAIJUSSVLPPPEQLGVGWFPIAFLGLLV PLW 181 

CJuery: 185 IX3KRLVVYVIASVGLSYFLIATFLSGMiSVLIiRTVVGCSVGVVIi 228 

D RL + V + GL +L+ LGL +LIA V G +G L 
Sbjct: 182 D--RLSLLVAIJWM3I<3GWALSRVLPGaLVILIA6VGGALriGaflL 223 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that tliis protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2039 

A DNA sequence (GBSx2150) was identified m S.agalactiae <SEQ ID 6301> which encodes the amino 
acid sequence <SEQ ID 6302>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3794 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2040 

A DNA sequence (GBSx2151) was identified in S.agalactiae <SEQ ID 6303> which encodes the amino 
acid sequence <SEQ ID 6304>. Analysis of tliis protein sequence reveals the following: 

) N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5087 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Hot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10233> which encodes amino acid sequence <SEQ ID 
1 0234> was also identified. 

The protein has homology with the foUowmg sequences in the GENPEPT database. 



Query: 10 SNIX3YPRIX3EQREWKQ&IEaFIiffiGlSILEQKDLEKQLKQrj?Ill^ 69 

SNLGYPR+GE REWK+A+E+FWA + ++ L +K-1-LR+NHL+ Q-hE +DLIPVGDF+ 
Sbjct: 4 S^aGYPRIGE^^REWKK^alESFWflNDTTEEQIlIlATMKBLRIJilHLRVQ^ 63 



50 Query: 70 CYDHVLDLSFQPimPKRFDEY--ERHLDLYFAIARGDKI3NVASSMKKWElS^^ 127 

YDHVLD+t P +1PKRF + L YFA+ARG K+ A M KW+NTNYHYIVPE 
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Sbjct: 


64 


Query: 


128 


Sbjct: 


124 


Query: 


186 


Sbjct: 


183 


Query: 


246 


Sbjct: 


243 


Query: 


305 


Sbjct: 


303 


Query: 


363 


Sbjct: 


363 


Query: 


418 


Sbjct: 


422 


Query: 


473 


Sbj ct: 


482 


Query: 


538 


Sbjct: 


542 


Query: 


598 


Sbjct- 


602 


Query: 


658 


Sbjct: 


662 


Query: 


718 


Sbjct: 


722 



64 LYDHVLDMAVMFGIIPKRFLQQGDTPTLSTYFAMARGSKNAQACEMTraTYWTNyHyiVPE 



L+ YLEA+ +G KPVI GP ++V Ii+ G 



LY QV Q+L+nSGA IQ+DEP VT 



LPV G GLDF+HG A+NL A++ G +K L AG1++GRNIW ML E 



RL LQPS SLLHVPVTTK E LDP L L+FA+EKL EL L 



P+LPTniGSFPQ+ ++R+ R W++G IB +Y+ +K+ I +'HI IQE+L LDVLVHG 



REGLPL+ + QQ YLD AV+AF+ + + VK TQIHTHMCYS+F B+I++I LDADVIS 



IETSRSHG++1 +FE Y GIGLGVYDIHSPR+P++EE++ I+R+L L 



E ET+2iAL+ +V+A + R++L 
lEKETVAALKKMVAAARAAREEL 752 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2041 

A DNA sequence (GBSx2152) was identified in S.agalactiae <SEQ ID 6305> which encodes the amino 
acid sequence <SEQ ID 6306>. This protein is predicted to be metH. Analysis of this protein sequence 
reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0753 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Hot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05348 GB:APd01"B12 unkaovm. conserved protein [Bacillus halodurans] 
Identities = 301/610 (49%) , Positives = 437/610 (71%) , C3aps = 9/510 (1%) 

MSKFLEKLKTDILVMGMC3TLI,YTYGlJ3TCHESVNVTHPEKVIiAIHQAyiEAGADVIQT 60 
M+ 1-E LKT+ILV DGAMGTMiY G+D C E NVT PBK++A H AY+EAGADVIQT 
iranJVEMiKTOILVGIXSaMGTLLYEQGIDRCFBEDMmDPBKIVaAHVRYV^ 60 



h R +TDLP+I ++S+ 



h EA 4+1. LGAD++G+NC +GPY M++SL4 V L ++Y S YPK&S 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j ct : 


120 


Query: 


181 


Sbjct: 


179 


Query: 


241 


Sbjct: 


237, 


Query: 


301 


Sbjct: 


295 


Query: 


358 


Sbjct: 


355 


Query: 


418 


Sbjct: 


415 




478 


Sbjct: 


475 




538 


Sbjct: 


535 




598 


Sbjct: 





V +G\7RL+GGCCGTTP+H+RA ^ ++GLKP+ K V 



+ADNSri++ R+ NL++ ++++ ++ L+H+ CKD NLIGLQS L+G+ LG +hia 



TGDPTK+GDFPGATSVYDVTSF+L+SLIRQIiN+G+S+SG Ii + +F+V AAFNENV++ 



L R V+ +EKK+ +GaDYFMTQPl++ ++++ + TK +E+P +1GIMP+ H 



H1JEVPGIKL++ + + +D++ L +KSL+D A +YFNGIYLITPFLRY + 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2042 

A DNA sequence (GBSx2153) was identified in S.agalactiae <SEQ ID 6307> which encodes the amino 
acid sequence <SEQ ID 6308>. Analysis of this protein sequence reveals the following: 

possible site: 53 

>» Seems to have no M-terminal signal sequence 

INTEGRAL Likelihood = -9.55 Transmembrane 127 - 143 ( 121 - 147) 
INTEGRAL Likelihood = -1.44 Transmembrane 157 - 173 { 155 - 175) 



-- Certainty=0. 4821 (Affirmative) < 
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bacterial outside Certaiiity=0 . 0000 (Not Clear) < suco> 

bacterial cytoplasm --- Certainty=0. 0000 (Not. Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 1023 1> which encodes amino acid sequence <SEQ ID 
10232> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaC01354 GB:AI)390975 putative integral membrane protein 
[Streptonryces coelicolor A3 (2) ] 
Identities = 38/98 (38%) , Positives = 59/98 (59%) 

Query: 113 RIADDVARFGGSWTFIIVFVSIMAIWMLVNIMKPFGIQFDPYPFILIJNrLaLSTIAaiQAP 172 

R++4- VKRF G+ FI+ ++ +VJ++ N+ P G++FD YPFI L I) LS R+ AP 
Sbjct: 47 RLSERV;y?FLGTGRFIVWMTWIILVfVVWNVSAPSGLRFDEYPFIFLT3aiLSLQaSYAAP 106 

Query: 173 hXmS<3miTmi)mmi^P^!VmT3El£lRUjHEKL 210 

LI+++QNR D DR+ D N+ S + L +1 
Sbjct: 107 IiIIiUUJNRQDDRDRVNLEQDRKQNERSIflDTETraiTREI 144 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8965> and protein <SEQ ID 8966> were also identified. Analysis of this 

protein sequence reveals the following: 

tiipop: Possible site; -1 Crend: 7 

McG: Discrim Score: -3.84 

GvHs Signal Score (-7.5): -5.05 
Possible site: 53 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 2 value: -9.55 threshold: 0.0 

INTEGRAL Likelihood = -9.55 Transmembrane 127 - 143 ( 121 - 147) 
INTEGRAL Likelihood - -1.44 Transmembrane 157 173 ( 155 - 175) 
PERIPHERAL Likelihood = 5 .46 27 
modified ALOM score: 2.41 

*** Reasoning Step: 3 

Final Results 

bacterial membrane — Certainty=0 . 4821 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01598(637 - 930 Of 1341) 

GP|9714438|emb|CAC01354.l| |AL390975(47 - 144 of 198) putative integral membrane protein 
{Streptomyces coelicolor A3 (2 ) ] 
%Match =8.2 

%Identity = 3 8.8 %Similarity = 51.2 

Matches = 38 Mismatches = 38 Conservative Sub.e = 22 

600 630 660 690 720 750 780 810 

MKEEEKPFNVEERLNKQATIGQRIADDVRRFGGSWTFIIVFVSIMAIWMLVNIMKPKSIQFDPYPFIL^^ 

l = := nil 1= 11= == :|:: 1= I |::|| lllhl I II h 
RIJDQPRPPRRRLLPEWDPESFGRLSERVARFLGTGRFIVMylTWIILWVVlWSflPSGLRPDEYPFlFLTLMLSr<^ 
40 50 SO 70 80 90 100 

840 870 900 930 960 990 1020 1050 

APLIM4SQNRAADyDRLQaRNDFN\mTSELEIRLLHEKIDHMVQQDQPEliEIQKLQTEMLVSLGNQIAQLKQ^ 
lll|:=:||| I 11= I 1= I = I =1 

APLILLAQNRQDDRDRVNLEQDRKQNERSIADTEYLTREIAALRIGLGEVATRDWIRSELQDDVRDLEERQNGHHPDRGV 
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SEQ ID 8966 (GBS393) was expressed in E.coli as a His-ftision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 3; MW 30.81cDa). It was also expressed in E.coli as a GST-fusion 
5 product. SDS-PAGE analysis of total cell extract is shown in Figure 177 (lane 4; MW 56kDa) and in Figure 
83 (lane 6; MW 56kDa). 

GBS393-GST was purified as shown in Figure 217, lane 5. 
Example 2043 

A DNA sequence (GBSx2154) was identified in S.agalactiae <SEQ ID 6309> which encodes the amino 
10 acid sequence <SEQ ID 6310>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

INTEGRMj Likelihood = -3.29 Transmembrane 274 - 290 ( 271 - 291) 



Final Results 

bacterial membrane Certainty=0 . 2317 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 63 WGTDSTQSNIDKLVANPQVQAMAILGFGGGKAIJDTAKMVAKELGKKSFTIPTICSNCS 122 
++G + + 1++L + + D ++G GGGK LDTAK Va +L K +PT1 S + 

Sbjct: S2 IFGGECSDEEIERLSGLVE-E 



; 182 TLEAKTNKLPHT-AVIAjaVRLSSKEAEfXQFGEQGLKDVE^^ 238 

+ N ++ A+A E ++G + VE + A+E+I A +L 

; 181 KQKXAEIimTGEIXSSMTAraiJUUiCYETLLEyGVLAKRSVEEKSVT 240 



Query: 298 EKWaRETOCSIfiLPTTtADVSL---SEKDIPKIVEIAiyrrnffi---VKlTrPFDP^^ 351 

E+V F + +GLPTTLA++ Ii S++D+ K+ E A NE + P K A+ 
Sbjct: 292 EEWSFCEEVGLPTrrAEIGIiDGVSDEDIMKVAEKACDKNETIHNEPQPVTSKDVFFALK 351 

Query: 352 AADAFGQ 358 

AAD +G+ 
Sbjct: 352 AA0RYGR 358 

There is also homology to SEQ ID 3078. 

SEQ ID 6310 (GBS123) was expressed m E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 7; MW 43.3kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccmes or diagnostics. 
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Example 2044 

A DNA sequence (GBSx2155) was identified in S.agalactiae <SEQ ID 6311> which encodes the amino 
acid sequence <SEQ ID 6312>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0974 (Affirmative) <; suco 

bacterial membrane Certainty»0 . 0000 (Not Clear) < suco 

10 bacterial outside — Certaiiity=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6313> which encodes the amino acid 
sequence <SEQ ID 6314>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 17 
15 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytaplasm Certainty=0 .2368 (Affirmative) < suco 

bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 

20 bacterial outside --- CertaintytaO. OOOO (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/167 (55%) , Positives = 121/167 (72%) 

25 Query: 1 MKIAIlGYSGSGKSTUffiKLGim^CNVLHLDSIHFAPNWEERKVDDMIDDVSNMLEKRT 60 

+KIAIIG+SGSGKSTLflE LG +Y+C V HLD +HF+ NW+ER DMI D+S L K+ 
Sbjct: 1 LKIAIIGHSGSGKSTLARFLGQHYHCEVFHLDQLHFSSNWQERSDHDMIADLSTCLLKQD 60 

Queory; 61 WIIEGNyKKXjLYQERLADaDEIIFFDFNRFNCLWRAFKRYCKFRGKTRPDMANGCPEKLD 120 
30 IIEGNY LY+BR+++AD 11+ +F+RF+C++RAFKRY +RGKTRPDMft.+ C EK D 

Sbjct: 61 LIIEGNYANCIiYEEHMSEADyilYTOFSRFHCVYEAFKRYIiNyRGKTRPDMADNCQEKFD 120 

Query: 121 FEFISWILKDGRSDKQKSNYEQVVEDYPQKIKII1KHQRDLDQYI.KEL 167 
F+ WIL DGRS Q Y+ W+ Y K +L +Q+ Ii Y+ + 
35 ■ Sbjct: 121 VAFVKMirjiDGRSRlJQLKKYQSVVQKYSHRriVLTNQKQLSHVMNTI 167 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2045 

40 A DNA sequence (GBSx2156) was identified in S.agalactiae <SEQ ID 6315> which encodes the amino 
acid sequence <SEQ ID 6316>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm — Certainty=0. 3874 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=O.OQO0 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CZyi.41941 GB:X59250 initiation factor IF-1 [Lactococcus lactis] 
Identities = 62/72 (86%), Positives = 70/72 (97%) 

Query: 1 NIWCEDVIEIEGraATETMPMAMFTVHLENCSlQIIATVSGKIRKNYIRILVGDRVTVEMSPY 60 
55 MAK+DVIE++GKVV+TMPNaMFTVELENGHQ+LAT+SGKIRKNYIRIL GD+V VE+SPY 

Sbjct: 1 MRKDDVIEVIX3ranm™ENaMFTVEr.ENGHQVIATISGKIRKNYIRILPGDKVQVE^ 60 
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: 61 DLTRGRITYRFK 72 

DLTRGRITYRFK 
: 61 DLTRGRITYRFK 72 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6317> which encodes tlie a 
sequence <SEQ ID 6318>. Analysis of this protein sequence reveals the following: 

n-tezminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3253 (Affirmative) < suco 

bacterial menibranfi Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS protems is shown below. 

Identities = 67/67 (100%) , Positives = 67/67 (100%) 

Query: 6 VIEIEGKVVETMPNMFTVELEWGHQIBATVSGKIRKlSnflRILVGDRVTV™ 65 

VIEIEGRVTOTMEOSffiMFTVELENGHQILATTOGKIRKNYIRILVGDRVTVEMSPYDL^ 
Sbjot: 1 VIEIEGKVVETMENMFTVELENGHQILATVSGKIRKNyiRILVGDRVTVEMSPYDLTRG 60 

Query: 66 RITYRFK 72 

RITYRFK 
Sbjct: 61 RITYRFK 67 

Biased on fhds analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2046 

A DNA sequence (GBSx2157) was identified in S.agalactiae <SEQ ID 6319> which encodes the amino 
acid sequence <SEQ ID 6320>. This protein is predicted to be adenylate kinase (adk). Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

,>GP!CAA41940 GB:X59250 adenylate kinase [Lactocoocus lactis] 
Identities = 146/214 (68%) , Positives = 170/214 (79%) , Gaps = 6/214 (2%) 

Query: 1 MNLLIMGLPGaGKGTQJ^lVEEFGVAHISTGDMFRAAMMIQTEMGRLAKSYIDKGELVP 60 

MNLLIMGLPGAGKGTQA IV+ +GV HISTGDMFRAAM N+TEMG+LAKS+IDKGELVP 
Sbjct: 1 MNLLIMGLPGAGKGTQAEFIVKlNYGWISTGDMFRAiyilKNETEMGKIAKSFlDKGELVP 60 

Query: 61 DEVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLESLGLRLDGVINIKVDPSC 120 

DEVTNGIVKERIA+DDI GFLLDGYPRTI+QAHALD LEELG++LD V+NI V+P+ 
Sbjct: 61 DEVTNGIVKERLAQDDIKASGFLLDGYPRTIDQAHALDTKLEELGIKLDAWNIWNENI 120 

Query: 121 lilERLSGRlINRKTGETFHKVENPPV DYKEEDYYQREDDKPETVKRRLDVNIAQ 174 

Ii++RIjSGR 1 R G T+HK+FNP D YQR DD PETVK JilijDVNI + 

Sbjot: 121 LVDRIfiGRYICRNCGATYHKIEOTTKVEGTCDVaffiHDLYQRftDDVPETVKN^ 180 



Query: 175 GEPILEHYRKLGLVTDIEGNQEITEVFAEVEKaL 208 
PI+KHY +LGLV +IEG QEI++V D++K I. 
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Sbjot: 181 SAPIIEHYTELGLVKNIEGEQEISQVTDDIKKVL 214 



A related DNA sequence was identified in S.pyogenes <SEQ ID 632 1> which encodes the amino acid 
sequence <SEQ ID 6322>. Analysis of this protein sequence reveals the following: 

5 Possible site: 17 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside --- Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 208/212 (98%) , Positives = 212/212 (99%) 



DEVTNGIVKBRIMDDIAEKGFLLDGYPRTIEQflHALDATLEELGLRLDGVINIKVDPSC 



HTOKLGLVTDIEGNQEIT+VFADVEKZUJLELK 
HYRKIiGLVTDIEGNQEITDVFflDVEKALLEa:,K 212 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ K> 8967> and protein <SEQ ID 8968> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score; -1.04 
GvH: Signal Score (-7.5): -1.08 

Possible site: 17 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 0 value: 6.79 threshold: 0.0. 
PERIPHERAL Likelihood = 6.79 106 
modified ALOM score: -1.86 



Query: 


1 


Sbjct: 






61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 



45 *** Reasoning Step: 3 

Final Results 

bacterial membrane — CertaintjfsO . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

over 213aa 

Lactococcus lactis 

EGAD 1 8612 1 adenylate kinase Insert characterized 

SP|P27143lKaD_LACLA ADENYLATE KINASE (EC 2.7.4.3) (ATP-AMP, TRANSPHOSPHORYLASE) . Edit 

characterized 

GP (44074 1 eitt)|CAR41940.l| 1X59250 adenylate kinase Insert characterized 
PIR|S17987|S17987 adenylate kinase (EC 2.7.4.3) - subsp. lactis Insert characterized 
PIR|B44812|B44812 adenylate kinase (EC 2.7.4.3) - Insert characterized 
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ORF01658(301 - 924 of 1236) 

EGAD| 8612 1 8416(1 - 214 of 215) adenylate kinase {Lactococcus lactis}SP| P27143 1 KAD_DACLA 
ADENYLATE KINASE (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYL&SE) .GPl44074 1 eTiib| C3!ia41940 . 1 1 |X59250 
5 adenylate kinase {Lactococcus lactis}PIR|S17987|S17987 adenylate kinase (EC 2.7.4.3) - ' 

Lactococcus lactis subsp. lactisPIRlB44812|B44812 adenylate kinase (EC 2.7.4.3) 
Lactococcus lactis 
%Match =34.8 

%Identity =69.5 %Siinilarity =81.0 
10 Matches = 146 Mismatches = 38 Conservative Sub.s = 24 



QaYSF*LQRVLKV-*mSRAIF*RDAMLDS*IQQimi*VDSVNIiFCPLISPTCCVGPI*EQNK^^ 

IIIIMIIIIIIII 

MNLLIMGLPGAGKC3 



TQAAKIVEEFCTMISTGDMFRftAMANQ^ITOGRLMSYIDKBELVPDEVTNGIVKERLlffiDDIA^ 

III Ih :|l llllllllllll hlllhlllhllllllllllllllllllllhlll llllllllllhll 

TQAEFIVKNYGViraiSTGDMFRAaMKNETEMGKLaKSFIDKGELVPDEVTNGIVKERLAQDDIKASGFLLDGYPRTI 



HALDATLEELGLRLDGVINIKVDPSCLIERLSXRIINRKTGETFHKVFNPP VDY-KEEDYYQREDDKPETVKRRL 

nil h|: |--:|il II I I I •■ II : II I I I III 11 Mill II 

HALDTMLEELGIKLDAVWIVVinWILVDRLSGRYICiaiCGATYHKIFNPTKVEGTCT^ 



834 864 894 924 954 984 1014 1044 

DVNIi^EPILEHYRKLGLVTDIEGNQBITEVFMVEK2iLLELK*IMLIYLHK*ISNDILS*SDL*LLPLYRGH^ 
MM : IIMII :||ll :III ll|::| WM I 



SEQ ID 8968 (GBS114) was expressed in E.coli as a His-fasion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 9; MW 26.9kDa). 

The GBS114-His fusion product was purified (Figure 108A; see also Figure 200, lane 8) and used to 
immunise mice (lane 1+2+3 product; 20n,g/mouse). The resulting antiserum was used for Western blot 
(Figure 108B), FACS (Figure 108C ), and in the in vivo passive protection assay (Table III). These tests 
confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective protective 



Example 2047 

A DNA sequence (GBSx2158) was identified in S.agalactiae <SEQ ID 6323> which encodes the amino 
45 acid sequence <SEQ ID 6324>. This protein is predicted to be preprotein translocase secy subunit (secY). 
Analysis of this protein sequence reveals the following: 



Possible site: 35 



50 



55 





have an -uncleavable N 


term signal seg 










INTEGRAL 


Likelihood = 


14 


01 


Transmembrane 


217 


233 


209 


240 


INTEGRAL 


Likelihood = 


-8 


65 


Transmembrane 


314 


330 


307 


334 


INTEGRSi 


Likelihood = 


-6 


16 


Transmembrane 


369 


385 


363 


- 392 


INTEGRAL 


Likelihood = 


-5 


36 


Transmembrane 


19 


35 


17 


40 


INTEGRAL 


Likelihood = 


-3 


93 


Transmembrane 


180 




179 


199 


INTEGRAL 


Likelihood = 


-3 


03 


Transmembrane 


395 




392 


- 412 


INTEGRAL 


Likelihood = 


-2 


55 


Transmembrane 


151 


167 


151 


168 


INTEGRAL 


Likelihood = 


-2 


02 


Transmembrane 


117 


133 


117 


133 


INTEGRAL 


Likelihood = 


-0 


64 


Transmembrane 


270 


286 


269 


- 286 



Final Results 
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bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0-6S04 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty^O . 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9467> which encodes amino acid sequence <SEQ ID 9468> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

= 2/433 (0%) 



>GE:CAR41939 GB:X59250 SecY protein [Lactoooccus lactis] 
Identities = 292/433 (67%), Positives = 361/433 (82%), Gaps = 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


299 




301 




359 


Sbjct! 


361 




419 


Sbjct: 





MFLtOXRDALIOTCMVRNKILFTIFIIiVFRIGTHITVPGIl^KSLEQMGELPFLNMIJSILV S 0 
MF K L++A KVK TO +ILFTIFIL VFR+G HIT PG+NV++L+Q+ +LPFL+M+NI1V 
MFFKTLKEAFKVKDVRARILFTIFILBVFia^GaHITAPGVNVQNLQQVADLPFI^M^ 6 0 



SGNRM+N+S+F+MGVSPYITASI+VQLLQMDILPKFVEW KQGE+GREKLNQRTRYI+Ii 



S+IIPaGI+S IPSA! ++Y++ F+NVR S I S+IFV LI++ + I++ TTF+QQ2ffi 



K+PIQSTKL QGAPTSSYLPIi+VHPflGVIPVIPA SITT P+TI+ F Q G 



I- LQ L+y T GM+ Xa+LI+LF+FFY+FVQVNPEK AENLQK SYIPS+RPG+ TE 



+Y+S LL +LAT+GS+FL IS++PI RQ 



+ALGGTSLI.ILI 1+ +KQLE 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3987> which encodes the amino acid 
sequence <SBQ ID 3988>. Analysis of this protein sequence reveals the following: 

45 Possible site: 55 





have an imcleavable N- 


term signal seg 










INTEGRAL 


Likelihood =- 


14 


70 


Tranansnibrane 


233 


249 


( 226 


2SS) 


IMTEGRAL 


Likelihood = 


-8 


12 


Tremsmeiiibrane 


330 


346 


( 323 


350) 


INTEGRAL 


Likelihood = 


-6 


10 


Transmenibrane 


384 




( 378 


403) 


INTEGRAL 


Likelihood = 


-5 


20 


Transmembrane 


35 


51 


( 33 


56) 


INTEGRAL 


Likelihood = 




09 


Transmembrane 


199 


215 


{ 195 


215) 


INTEGRAL 


Likelihood = 


-3 


56 


Transmetitirane 


167 


183 


( 165 




INTEGRAL 


Likelihood = 


-1 


65 


Transmembrane 


411 


427 


( 411 


428) 


INTEGRAL 


Likelihood = 


-1 


49 


Transmembrane 


133 


149 


{ 133 


149) 


INTEGRAL 


Likelihood = 


-0 


64 


Transmembrane 


286 


302 


( 285 


302) 



Final Results 

bacterial membrane Certainty=0 . 6880 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) <: suco 
60 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identitie 

Sbjct: 17 
Query: 61 
Sbjct: 77 



= 377/434 (86%) , 



! = 417/434 (95%) 



MFIjKIiLRDMiKSmWimKILFTIFIIjLVFRIGTHITVPQIlWKSLEQmELPFt^^ 60 
MFLK+L+DALK+K VRNKl FTIFI+LVFRIQTHITVPG+N KSI1EQ+ ELPFIiNMIiNLV 
MFLKILKnALKIKTVRNKIFFTlFI ILVPRIGTHITVPGVNfiJKSLEQLSELPFIMlLNLV 7 6 



Query: 121 lAFVQSIGITAGFNTLSSVaLVKTEIWQTYLLIGAILTTGSlWVTWnLGEQ^ 180 

LAF QSIGITAGElin'LS+VaLVraP+++TYLLIGA+LTTOS++VTWLGEQITDKBFOT 
Sbjct: 137 liAFAQSlGITAGFOTTiSNWajVKITOIKraLlGaLIjTTO 196 

Query: 181 SMIIFAGIISSIPSAITTIYEDFFVNVRSSAITNSYIFVGILIVAVLaiVFFTTFIQQRE 240 

SMIIEAGIlSSIPaai TI ED+PVNV++S + +SY+ VGILI+AVIAIVFFTT++QQRE 
Sbjct: 197 SMIIFAGIISSIPSAIATIREDYFVNVKRSDLHSSYIiIVGILIIAVIiAIVFFTTIfVQQaE 256 

Query: 241 YKIPIQyTKLVQG&PTSSYLPLKVNPAGVIPVIFASSITTIPSTIIPFFQtlGKEIPWLTK 300 

YKIPIQYTKL+QGAPTSSYLPLKVNPAGVIFVIFASSITTIPSTIIPF QKG+++PWL + 
Sbjct: 257 YKIPIQYTKIMQGAPTSSYLPLKVNPAGVIFVIFASSITTIPSTIIPFVQN6RDLPWIJSIR 316 

Query: 301 LQELLNYQTPVGMIIYAILIILFSFFYTFVQVNPEKTAENLQKNSSYIPSIRPGRETEEY 360 

LQE+ NYQTPVGMI+YA+LIILFSFFYTFVQVMPEKTAENi:iQKNSSYIPS+RPGRETE++ 
Sbjct: 317 LQEIENYQTPVGMIVYALLIILFSFFYTFVQVNPEKTAENLQKNSSYIPSVRPGRETEQF 376 

Query: 361 MSSLLKKLRTIGSVFIAFISLLPIIAQQAIiHLSSSIALGGTSLLILIATGIKGMKQLEGY 420 

MS+LLKKIAT+G++FLAFISL PI AQQAL+LSSSIALGGTSLLILI+TGIEGMKQLEGY 
Sbjct: 377 MSALLKKIATVGAIFIAFISLAPlAAQQAtNLSSSIALGGTSLLILISTGlEGMKQLEGY 436 

Query: 421 LLKRRYVGFMNTTE 434 

LLKR+YVGFMNT E 
Sbjct: 437 LLKRKYVGFMNTAE 450 



A related GBS gene <SEQ ID 8969> and protein <SEQ ID 8970> were also identified. Analysis of this 
protein sequence reveals flie following: 

Lipqp: Possible site: -1 Crend: 10 
McG: Disorim Score: 6.16 
GvH: Signal Score (-7.5): -4.32 

Possible site: 35 
>>> Seems to have 
ALOM program 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmerabrane 
Tranemeuibrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



69 



217 - 233 { 209 - 24( 

311 - 327 ( 307 - 33< 

369 - 385 ( 363 - 39: 

19 - 35 ( 17 - 4( 

180 - 196 ( 179 - 19! 

395 - 411 ( 392 - 4i: 

151 - 167 ( 151 - 16f 

117 - 133 ( 117 - 13: 

270 - 286 ( 269 - 28< 



*• Reasoning Step: 3 



■-- Certainty^O. 6604 (Affirmative) . 
bacterial outside — Certainty=0 . 0000 (Not Clear) < 1 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < 1 



The protein has homology with flie following sequences in the databases: 

ORF01657(301 - 1596 of 1902) 

EGAD| 6545 1 6344(1 - 434 of 439) preprotein translocase secy subunit {Lactococcus lactis} 
SP|P27148|SECY_IACLA PREPROTEIN TRANSIiOCASE SECY StBONIT. GP|44073 1 eirib|CAA41939. l| |X59250 
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SecY protein {Lactococcus lactis} PIr| S17985 | S17985 preprotein translocase seoY - 
Lactococcus lacti's subsp. lactis 
%Match =46.6 

%Identity =57.0 %Similarity =84.1 
5 Matches = 290 Mismatches = 68 Conservative Sub.s = 74 

72 102 132 162 192 222 252 282 

HQCKRICSCEP*PIKCL*RWY*SNSSCS*RSWNRAC*KIRR*NSW*W*IN*EIVC*SS*IP*IC*SSYHC*RWBmS 

10 312 342 372 402 432 462 492 522 

NER*LIMFLKIiRDALKVraiVENKILFTIFILLWRIGTHIWPGIlWKSLEQMGELPFLNMIM 

h^hlll II :|llllll|:|lhl III Ihlh^hh : I I I h I = I I I I I I I I : h I = h I I 
MFFKTLKEAFKVKDVEJiRILFTIFILFVFRLGftHITAPGVWQNLQQVADLPFLSMtmVSGNMQI^^ 
10 20 30 40 SO 60 70 

15 

552 582 612 642 672 702 732 762 

VSPYITASIWQLLQMDiriPKFVEWGKQGEVGRRKLNQATRYISLFIAFVQSIGITAGFfrrLSSVALVKrPNVQTYIiLIG 
IIIIIIIIIHIIIIIIIIMIIIl IIIMIIIIIIIIIIhl II lllllllll :|h =h II hlhll 
VSPYITASIIVQLLQMDILPKFVEWSKQGEIGRRKIiNQATRYITLVLftMaQSIGITAGFQaMSSLNIVQ^ 
20 90 100 110 120 130 140 ISO 

792 822 8S2 882 912 942 972 1002 

AILTTGSMVVTWIX3EQITDKGFGNGVSMIIFMIISSIPSAITTIYEDFFVNVRSSAITNSYIFVGILIVAVM 
:|llllllll|:|||| :||||:|IMIIII|:| lllll ::|:: |:||| I I Mil 11 = : : 1 = : || 
25 VLLTTGSKmWGEQINEKGFGSGVSVIIFAGrVSGIPSAIKSVYDEKFIiNTOPSEIPMSWIFVIGIiIIjSAIVIIYVTT 
170 180 190 200 210 220 230 

1032 1062 ^ 1092 1122 1152 1176 1206 1236 

FIQQAEYKIPIQYTKLVQGAPTSSYLPLKVNPAGVIPVIFASS1TTIPSTIIPFFQ--NGKEIPWLTKLQELLNYQTPVG 
30 . |:|||| hlllllll II|I|||||||:||||||IIIIII III! |:||: hi I = Ih II hi I I 

FVQQAERKVPIQYTKLTQGAPTSSYLPLRWPAGVIPVIERGSITTAPATIIOFLQRSQGSNVGWLSTLQ^ 

250 260 270 280 290 300 310 

1266 1296 1326 1356 1386 1416 1446 1476 

35 MIlYAIIillLFSFFXTEXQVNPEKTAENLQKNSSYIPSIRPGRETEEYMSSIiKIOiATIGSVFIAFISLLPIIAQQALHL 
1= Ihlhlhll :| llllll mill llllhllh Ihhl II :|lhlhll :|h:|l II I 
MLFYALLIVLFTKFYSFVQVNPEKMAENLQKQGSYIPSVRPGKGTEaKYVSRIJM^ 

330 340 350 360 370 380 390 

40 1506 1536 1566 1596 1626 1656 1686 1716 

SSSIAIiGGTSLLILIATGIEGMKQLEGYLLKRRYVGFMlSrrTE*NIG*LCQPSlLFFNKSDMLCWIYLKT^ 

:|llllllllll h :|llllllllhl llh 
PKIVALGGTSLLILIQVAICJAVKQIiHEYLLKRKYAGFMEINPLETK 
410 420 430 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2048 

A DNA sequence (GBSx2159) was identified in S.agalactiae <SEQ ID 6325> which encodes tiie amino 
50 acid sequence <SEQ ID 6326>. This protein is predicted to be SOS ribosomal protein L15 (rplO). Analysis 
of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytqplasm Certainty^O. 5259 (Affirmative) < suco 

bacterial meiribrane CertaintysO. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 



60 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB54021 GB:U96620 ribosomal protein LIS [Staphylococcus aureus] 
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Identities = 116/146 (79%) , Positives = 128/146 (87%) 



Query: 1 MKIHELKPJffiGSRKViajRVGRGTSSGNGKTSGRGQKGQKSJlSGGGTOMFEGG^ 60 

MKIiHELKPAEGSRK RMSVGRG +4-GNGKTSGGG KGQKARSGGGVR 6FEGGQ PLFRR 
Sbjct: 1 MKLHELKPAEGSRSERNRVGRGVATGtraKTSGRGHRSQKRRSCSCraW 60 

Query: 61 MP!aiGFSNI]maSYALVNLDQIiNVFEDQTEVTPVVI.KEaGIVRAEKSGVK^ 120 

+PKRGF+NIN KEYA+VNLDQIiN FEDGTEVTP +L E+G+V+ EKSG+KILGNG L KK 
Sbjct: 61 LPKRGFIWINRKEYAIVNIiDQIJIKPEIXSrEVTPMiLVBSeVVKiraKS^ 120 

Query: 121 LSVKaAKFSKSAEAMTAKGGSIEVI 146 

L+VKA KFS S& AI AKGG+ EVI 
Sbjct: 121 LTVKfflKPSASftftEAIIffiKGGaHEVI 146 



15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6327> which encodes the amino acid 
sequence <SEQ ID 6328>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-teminal signal sequence 

20 Final Results 

bacterial cytoplasm --- Certainty=0. 5329 (Affirmative) < STico 
bacterial membrane — Certainty=0. 0000 (Not Clear) < succ> 
bacterial outside Certaanty=0 . 0000 (Not Clear) < suco 



25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/146 (92%), Positives = 142/146 (96%) 



Query; 


1 


MKLHELKPAEGSRKVRNRVGRGTSSGNGKTSGRGQKGQKARSGGGVRLGFEGGQTPLFRR 50 
MKLHELK AEGSRKVRNRVQRGTSSGNGKTSGRGQKGQKARSGGGVRLGFEGtSQTPLFRR 


Sbjct: 


1 


MKrflELKAftEGSRKVRNRVGRGTSSGNGKTSGRGQKGQKARSGGGVRLGFEGGQTPLFRR 60 


Query: 


61' 


MPKRGFSNIHAKEYALVNLDQIJIVFEDGTEVTPVVLKEAGIVRAEKSGVKII^ 120 






+PKRGF+NIN KEYALVNLDQLNVF+DGTEVTP +LK+AGIVEAEKSGVK+LGNGELTKK 


Sbjct: 


61 


IPTOGPTNIMTKEYALVNIJDQIjNVFDDGTEOTPAILKnRGIVHAEKSGVKma^ 120 




121 


LSVKARKFSKSAERAITAKGGSIEVI 146 






L+VKARKFSKSAEftAI AKGGSIEVI 


Sbjct: 


121 


LTVKAAKFSKSAERAIIAKGGSIEVI 146 



40 Based on this analysis, it was predicted fliat this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 



Example 2049 

A DNA sequence (GBSx2160) was identified in S.agalactiae <SEQ ID 6329> which encodes the amino 

acid sequence <SEQ ID 6330>. Analysis of this protein sequence reveals the following: 

45 Possible Bite: 53 

>» Seems to have no M- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1162 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB54020 08:1196620 riboBomal protein L30 [Staphylococcus 
Identities = 40/58 (68%), Positives = 46/58 (78%) 
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Sbjct: 1 MAKLQITLTRSVIGRPETQIOT^ffiAIfiIlKKraSSVVVEDNPAIRGQINiWKHL 58 

A related DNA sequence was identified in S.pyogenes <SEQ ID 633 1> which encodes the amino acid 
sequence <SEQ ID 6332>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1088 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 56/58 (96%) , Positives = 57/58 (97%) 

. <3uery: 1 MAQIKITLTKSPIGRKPEQRKTWiU^LGKrjSISSVVKEDNaMRGMV^ 58 
laQIKITLTKSPIGRKPEQRKXVVRLGIfiKIiNSSVVKEnKftAIRGMV AISHI,VTVE+ 
Sbjot: 1 mQIKITLTKSPIGRKPEQRKTVVALGIiGKIMSSVVKEDNRMRtSNOT 58 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2050 

A DNA sequence (GBSx2161) was identified in S.agalactiae <SEQ ID 6333> which encodes the amino 
acid sequence <SEQ ID 6334>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certainty-0 . 3226 (Affirmative) < succ> 

bacterial membrane Certainty=0. 0000 (Not Clear) <: suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2051 

A DNA sequence (GBSx2162) was identified in S.agalactiae <SEQ ID 6335> which encodes the amino 
acid sequence <SEQ ID 6336>, This protein is predicted to be 30S ribosomal protein S5 (rpsE), Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^o. 3179 (Affirmative) < suco 

bacterial meoibrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>aP:AA2^2699 GB-.M57621 ribosomal protein S5 [Bacillus 
stearothermophilus] 
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Identities = 119/158 (75%) , Positives = 139/158 (87%) 



Query: 6 M&VELEBRVVAIMRVTKVWGGMUJlFAM.WVGDRNGRVGFGTGKaQEVPEA.IRKaVEa 65 
N +EI1EERWA+NRV KWKGGRRLRF+ALWVGD+NG VGFGTGKaQKVPEAIRKft+E 
5 Sbjct: 7 NKLELEERWAVmVZUCWKGGRRLRPSAIiVVVGDKNGHVGFGTGKRQEVPEAIRKaiED 66 



Query: 66 AKKMMVEVPMVGTTIPHEVRSEFGQAKVLLKPAVEGaGVAAGGRVRAVIELaGVaDITSK 125 

AKKN++EVP+VGTTIPHEV FG +++r.KPA EG GV AGG RAV+ELAG++DI SK 
Sbjct: 67 AKKNLIBVPlVGTTIPHEVIQHFGftGEIIIiKPASEGTGVIAGGPARAVIiELaSISDILSK 126 

Query: 126 SLGSNTPINIVEATVEGLKQIiKRaEEVaALRGISVSDL 163 

S+GSNTPIN+VRAT H-OLKQLKRaE+VA LRG +V +l' 
Sbjct: 127 SIGSNTPINMVRATFDGLKQLKRAEDVAKLRGKXVEEL 164 



15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6337> which encodes the amino acid 
sequence <SEQ ID 6338>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytqplasm Certainty=0 .3179 (Af f iannative) < suco 

bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS aiid GBS proteins is shown below. 

Identities = 158/164 (96%), Positives = 161/164 (97%) 

Query: 1 M!^PKD^BWELEERWAINR^^'KVVKGGREU^EAR]WVVGDRSIGRVGPGTGK^ 60 

MAFKDNAVELEERVVAINRWKVVRGGRRIiRFAALVVVGD NGRVGFGTGKAQEVPEAIR 
Sbjct: 1 MAFKDNAVELEERVVaiNROTKVVKGGRRLRFAALVVVGDGiraRVGFGTG^ 60 

Query: 61 KAVEAAKKNMVEVPMVGTTIPHEVRSEFGGAKVLLKPAVEGAGVAAGGAVRAVIEIJ^^ 120 

KAVEARKKNM+EVPMVGrriPHEV + FOGAKVLLKPAVEG+GVAAGGAVRAVIELAGVA 
Sbjct: 61 KAVEAAKKHMIEVPMVGTTIPHEWTNFGGAKVLLKPAVEGSGVAAGGAVRAVIELAGVA 120 

Query: 121 DITSKSLGSNTPINIVRATVEGIiKQLKRAEEVAALRGISVSDLA 164 

DITSKSLGSNTPINIVRATVEGLKQLKRAEEVAftLRGISVSDLA 
Sbjct: 121 DITSKSLGSNTPINIVRATVEGLKQLKRAEEVAALRGISVSDLA 164 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 



Example 2052 

A DNA sequence (GBSx2163) was identified in S.agalactiae <SEQ ID 6339> which encodes the amino 
acid sequence <SEQ ID 6340>. This protein is predicted to be 508 ribosomal protein LI 8 (iplR). Analysis 
45 of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to h^ve no W-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 4488 (Affirmative) < suco 

bacterial membrane — Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty>=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9465> which encodes amino acid sequence <SEQ ID 9466> 
55 was also identified. 

The protein has homology with the following sequences in the GEMPEPT database. 
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>GP:AaB06815 GB:L47971 ribosomal protein LIS [Bacillus subtilis] 
Identities = 86/120 (71%) , Positives = 97/120 (80%) , Gaps = 2/120 (1%) 

Query: 4 VISKPDKNKIRQKEHRRVRGKLSGTM)RPRLNIPRSim3IYAQVIDDmGVTIiASASTLD 63 

+I+K KN R KRH RVR KLSGTA+RPRIiN+PRSN lYftQ+IDDV GVTLASASTIiD 
Sbjct: 1 MITKTSKNaftRLKRHARVRAKLSGTAERPRUWFRSinailYAQIinDVNGVTIAS^ 60 

Query: 64 KE--VSNGTKTEQaVWGKLV2iERAVaKeiSEWFDRGGYLYHGRVKaiADSARE^ 121 

K+ V •!• T A VG+LVA+RA KGIS+WPDRGGYIiYHGRVKALfiD+ARE GLKF 
Sbjct: 61 El)L]WESTGDTSAATKVGELVAKRAAEKGISDVVPDRGGYLYHGRVKAI>aiB\AREAGLKF 120 

A related DNA sequence was identified in S.pyogenes <SEQ E) 6341> which encodes the amino acid 
sequence <SEQ ID 6342>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4488 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside CertaintyaO. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shoAvn below. 

Identities = 116/121 (95%) , Positives = 120/121 (98%) 

Query: 1 MKIVISKPDKNKIRQKRHRRVRGKLSGTADRPRLNIFRSMTGIYAQVIDDVAGVTLASAS 60 

+KIVISKPDKNKIRQKRHRRVRGKLSGTADRPRIiN+FRSNTGIYAaVIDDVAGVTIjASAS 
Sbjct: 1 WIVISKPDKNKIRQKRHRRVEGIOjSGTADRPRIOTFRSISrrGIYaQVIDDVAGVTLASAS 60 

Query: 61 TLDKEVSNGTKTEQAVVVGKLVRERAVAKGISEWFDRGGYLYHGRVKRIAPSAREJTO 121 
TLDK+VS GTKTEQAVWGKLVAERAVAKGISE\A/FDRGGYLYHGRVKALAD+ARENGLKF 

Sbjct: 61 TLDKDVSKGTKTEQAVWGKLVAERAVAKGISEWFDRGGYLYHGRVKALADMO^ENGLKF 121 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2053 

A DNA sequence (GBSx2164) was identified in S.agalactiae <SEQ ID 6343> which encodes the amino 
acid sequence <SEQ ID 6344>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=o . 1530 (Affirmative) • 

bacterial membrane Certainty-0. 0000 (Not Clesur) < i 

bacterial outside -— Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in tiie GENPEPT database. 

>GP:AAA22700 GB:M57622 ribosomal protein L6 [Bacillus 
stearothermophilus] 
Identities = 108/178 (60%) , Positives = 133/178 (74%) 

Query: 1 MSRIGNKVITLPASVEIINKDNVVTVKGPKBQLTREFNKNIGITVEGTEVTVTREN^ 60 , 

M R+G K I +PAGV + H VTVKBPKG+LTR F+ ++ ITVEG +TVTRP+D k' 
Sbjct: 1 MXRVGKKPIElPAGVTVT\mGmVTVKGPKGELTRTFHPDMTITVEG]!WITVTRPOT 60 

Query: 61 MKTIHGTTRaNLNNMWGVSEGFKKALEMRGVGYRAQLQGSKLVLSVGKSHQDEVEAPEG 120 

+ 4-HGTTR+ L NMV GVS+G++KALE+ GVGYRA QG KLVLSVG SH E+E EG 
Sbjct: 61 HRALHGTTRSLLANMVEGVSKGYEKALELVGVGYRASKQGKKEiVLSVGYSHPVEIEPEEG 120 

Query: 121 TOFEOTTPTTIimiGINKESVGarAAYVRSIRSPEPYKGKGIRYVGEFVKRKEGKrGK 178 
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+ KVP4 T I V G +K+ VG+ AA. +R++R PEPYKGKS3IRY GE VR KEGKTGK 
■ Sbjct: 121 LEIEVPSQTKIIVKGADKQRVGELAaNIRAVRPPEPyKGKGIRYBOELVRI.KEGKrGK 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6345> which encodes the amino acid 
5 sequence <SEQ ID 6346>. Analysis of this protein sequence reveals the following: 
Possible site: 17 

»> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1704 (Affirmative) < suco 

bacterial menArane — Certainty=0. 0000 (Not Clear) < buco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 153/178 (85%) , Positives - 166/178 (92%) 



Sbjct: 1 

Query: 61 ^TO•IHGT^RAMLNN^OTGVSEGFKKALE^IRGVGYRAQLQGSKLVLSVGKSHQDE^^ 120 

MKTIHGTTRANLNNMWGVSEGFKK LBM+GVGYRAQLQG+KLVLSVGKSHQDEVEAPEG 
Sbjct: 61 MKTIHGTTRaNIJWWGVSEGFKTOLEMKGVGYRAQLQGTKLVLSVGKSHQDEVEaPEG 120 

Query: 121 VTFEVPTPTTINVIGINKESVGQTAAYVRSLRSPEPYKGKGIRYVGEFVRRKEGKTGK 178 

+TF V PT+I+V GINKB VGQTAAY+RSLRSPEPYKGKGIRWGE+VR KEGKTGK 
Sbjct: 121 ITFTVANPTSISVEGINKEWGQTAAYIRSLRSPEPYKGKGIRYVGEYVRLKEGKTGK 178 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2054 

A DNA sequence (GBSx2165) was identified in S.agalactiae <SEQ ID 6347> which encodes the amino 
acid sequence <SEQ ID 6348>. This protein is predicted to be 30S ribosomal protein S8 (rpsH). Analysis of 
this protein sequence reveals the following: 

35 Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4356 (Affirmative) < suco 
40 bacterial , membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB06813 GB:L47971 ribosomal protein SB [Bacillus subtilis] 
Identities = 100/132 (75%) , Positives = 116/132 (87%) 



Query: 1 MVMTDPIADFLTRIRNANQAKIIEVLEVPASNIKKGIADILKREGFVKNVEVIEDDKQGII 50 

MVMTDPIAD LTRIRMAN +HE IiE+PAS +K+ IA+ILKREGF4-++VE +ED KQGII 
Sbjct: 1 MVMTpPIADMLTRIRKANMVRHEKLEIPASKLKREIAEILKREGFIRDVEFVEDSKQGII SO 

Query: SI RVFLKYGQNGERVITNLB31ISKPGLRVYTKHEDMPKVLNGLGIAIVSTSEGLLTDKEARQ 120 

RVFLKYGQN ERVIT LKRISKPGLRVY K ++P+VLNGLGIAI+STS+G+LTDKEAR 
Sbjct: 61 RVFLKYGQNNERVITGLKRISKPGlARVYAKSNEVPRVEISIGLGIAIISTSQGVLTDKEaRA 120 

Query: 121 KNIGGEVLAYIW 132 

K GGEVLAY-fW 
Sbjct: 121 KQRGGEVLAYVW 132 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6349> which encodes the amino acid 
sequence <SEQ ID 6350>. Analysis of this- protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4327 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/132 (92%), Positives = 129/132 (97%) 



MVMTDPIADFLTRIRNANQ KHEVLEVPASNIKKGIA+ILKREGFVKHVEVIEDDKQGII 



RVFIiKYG+NGERVITNLKRISKPGLRVY K +DMPKVIiNGrjGIAI+STSEGLIiTDKEaRQ 



Query: 


1 


SbjCt: 


1 




61 


Sbjct: 


61 




121 


Sbj Ot : 


121 


ised on this 



KN4-GGEV+AY+W 



vaccines or diagnostics. 
Example 2055 

A DNA sequence (GBSx2166) was identified in S.agalactiae <SEQ ID 6351> which encodes the amino 
acid sequence <SEQ ID 6352>. This protein is predicted to be ribosomal protein S14 (rpsTSI). Analysis of 
this protein sequence reveals tiie following: 

N- terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certaintyi=0 .3833 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB11905 GB: 299104 ribosomal protein S14 [Bacillus subtilis] 
Identities - 47/61 (77%) , Positives = 53/61 (86%) 

Query: 1 MAKKSMIAIOraiPAKPSTQAyTRCKKCGRPHSVYRKFQLCRVCFRDIAYRGQVPGVTKAS 60 
45 MAKKSMIAK +R KF Q -XTRCE+CGREHSV RKF+LCR+CFR+LAYRGQ+PGV KAS 

Sbjct: 1 MAKKSMIRKQQRTPKFKVQKyTRCERCGRPHSVIRKFKLCRICFRELAYKGQIPGVKKAS 60 

Query: 61 W SI 
M 

50 Sbjct: 61 W 61 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6353> which 
sequence <SEQ ID 6354>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
55 >» Seems to have no N-terrainal signal sequence 

Final Results 
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bacterial cytoplasm Certainty=0. 4747 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 55/61 (90%) , Positives = 59/51 (96%) 

Query: 1 MaKKSMIAKNKRPAKFSTQAYTRCEKOSRPHSVYRKFQLCRVCFI^ 60 

+aKKSMIAKNKREAK STQAXTRCERCXSRPHSVreKP+LCRVCFR+IiAYKEQ+PGV KftS 
Sbjct: 1 LaKKSMIAKNKRPAKHSTQA.YTRCEKOTRPHSVYRKFKLCRVCFREIAYKGQIPGVVKAS 60 

Query: 61 W 61 

W 

Sbjct: 61 W 61 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2056 

A DNA sequence (GBSx2167) was identified in S.agalactiae <SEQ ID 6355> which encodes the amino 
acid sequence <SEQ ID 6356>. This protein is predicted to be SOS ribosomal protein L5 (rplE). Analysis of 
tins protein sequence reveals the following: 

3 N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .1845 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000(Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 



Query: 3 NRLKi:KyTNEVVPALTEKPNYSSVmVPKVEKIVI»GVGDAVSNAKNLEKAAAELALIS 62 

NRLKEKY E+VP+LTEKPNYSSVMAVPK+EKIV+lilMGVGnAV NAK L+KA EL 1+ 
Sbjct: 2 NRLKEKSQKEIVPSLTEKFIWSSVMAVPKLEKIVVNMGVGDAVQNAKALDKAVEELTEIT 61 

Query: 63 GQKPLITKAKKSIAGFRLREGVAIGAKVTIjRGERMVEFLDKLVSVSLPRVEDFHGVPTKS 122 

GQKP+ITKaKKSIAGF+LREG+ IGAKVTLRGERMYEFLDKL+SVSLPRVEDF G+ K+ 
Sbjct: 62 GQKPIITKAKKSIAGFKLREtMPIGAKVTLRGERMYEFLDKLISVSLPRVRDFRGISKKA 121 

Query: 123 FDGRGimLGVKEQLIFPEINFDDVDKVRGriDIVIVTTANTDEESREI.LKGLGMPFJUC 180 

FDGRGNYTIiGVKEQLIFPEI++D VDKVRG+D+VrVTTA+TDEE+RELL +GMPF K 
Sbjct: 122 PDGRGNYTrjGVKEQLIFPEIDTOKVDKVRGMDVVIVTrASTDEEARELLSQMGMPFQK 179 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6357> which encodes the amino acid 
sequence <SEQ ID 6358>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1793 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 177/180 (98%) , Positives = 180/180 (99%) 
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Query: 

Sbjct: 


1 

1 


MaNRLKEKYTKEVVPALlEKElsrySSVMAVPKV^ 

MANRLKEKTTNEV+PALTEKEm+SVMftVPKVEKIViaMSVGDAVSH^^ 
MANRIiKEKYITOVIPJU^IEKEim'SVmVPKVEKIVIiMJ^ 


60 
60 


Query: 


61 


ISGQKPLITKAKK3IAGPRLREGVAIGBJm?LRaERN^^ 


120 


Sbjct: 


61 


ISGQKPLITKAKKSIAGFRIjmSVAIGRKOTLRGERMYEFLDKLVSVSLPRVRDFHGVPT 
ISGQKPLITK&KKSIAGERLREGVaiGSUCVTLRGERMYEBTJDKLVSVSLPRVRDFHGVPT 


120 


Sbjct: 


121 


KSFIX3ROTmX3VKEQr,IFPEIOTDim)KVRGLDIVIVTTMSITDEESREIiLKGM^ 

KSFDGRGim'LGVKEQLIFPEI+FDDVDKUEGLDIVIVTTSNTDEESRELLKGLGMPFAK 

KSFDGRGNYTLGVKEQLIFPEISFDDVDKVRGLDIVIVTTAHTDEESRELLKGLGMPFAK 


180 

180 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 2057 

A DNA sequence (GBSx2169) was identified in S.agalactiae <SEQ ID 6359> which encodes the amino 
acid sequence <SEQ ID 6360>. This protein is predicted to be SOS ribosomal protein L24 (iplX). Analysis 
of this protein sequence reveals the following: 

20 Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytcrplasm Certainty=0. 1850 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:A2yD33285 GB:AF126061 RpL24 [Streptococcus pneumoniae] 
30 Identities = 89/101 (88%) , Positives = 94/101 (92%) 

Query: 1 MFVKKGDKVRVIAGKDKBTEAVVLK3UiPKVNKyVVEGV2UiIKKHQKEN^^ 60 

MFVKKGDKVRVIAGKDKGTE&WL ALPKVNKV+VEGV ++KKHQ+P NE PQG I+EKE 
Sbjct: 1 MFVKKGDKVRVIAGKDKGTEAVVLTALPKVNKVIVEGVNIVKKHQRPTNEDPQGGIIEKE 50 

35 

Query: 61 APIHVSNVQVLDKNGVAGRVGyKWDGKKVRYNKKSGEVLD 101 

A IHVSNVQVLDKNGVAGRVGYK VDGKKVRYNKKSGEVLD 
Sbjct: 61 I^IHVSHVQVIiDKE5GV3«3RVGYKFVDGKKVR-!mKKSGEVIiD 101 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 636 1> whicli encodes the amino acid 
sequence <SEQ ID 6362>. Analysis of this protein sequence reveals the following: 

Possible site: 26 ' ' ' . 

»> Seems to have no N-terminal signal sequence 

45 Final Resiilts 

bacterial cytoplasm Certainty=0. 1850 (Affirmative) < suco 

bacterial membrane Certainty=o. 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) <■ suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 95/101 (94%), Positives = 99/101 (97%) 

Query: 1 MFVKKGDKVRVIAGKDKGTEAVVLKALPICVNKyVVEGVALIKKHQKPimENPQQaiVEK^ 60 
MFVKKGDKVRVIAGKDKGTEAWLKALPKVNKV+VEGV +IKKHQKEN ENP(2GAIVEKE 
55 Sbjct: 1 MFVKKGDKWVIAGKDKGTEAVVLKALPKV]eCVIVEGVGMIK3CHQKI>N^^ 60 

Query: 61 APIHVSKVQVLDKNGVAGRVGYKVVDGKKVRYNKKSGEVLD 101 
APIHVSNVQVLDK^K3mGR+GYKWnGKKVRY+KKSGEVLD 
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Sbjct: 61 APIHVSNVQVLDKNGVAGRIGYKWDGKKVRYSKKSGEVLD 101 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2058 

A DNA sequence (GBSx2170) was identified in S.agalactiae <SEQ ID 6363> which encodes the amino 
acid sequence <SEQ ID 6364>. This protein is predicted to be SOS ribosomal protein L14 (iplN). Analysis 
of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .1004 (Affirmative) < suco 

bacterial inembrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < auco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MIQQETRLKm)NSGaREILTIKVLGGSGRKFMfIGDVIVMVKQa,TPGGRVKKEDVVKA 60 

MIQ ETRLKVAmSSAREILTIKVLGGSGRKFflNIGDVIVRSVKQATPGGRVKKGDVV^ 
Sbjct: 1 MIQTETRLKVMNSGRRElLTIKVLGGSGRKFJVNIGDVIV/iSVKQATPGGAVKKGDVVKA 60 

Query: 61 VIVRTKTGARRPDGSYIKFDDNAAVIIRDDKTPRGTRIFGPVARELREGGVMKIVSLAPE X20 

VIVRTK+GARR DGSYIKFD+MAAVIIR+DKIPRGTRIFGPVARELREGG+MKIVSLAPE 
Sbjct: 61 VIVRTKSGARRADGSYIKFDElSaAVIIREDKTPRGTRIPGPVfllffilJiEGGEMKIVSiajPE 120 

Query: 121 VL 122 

VL 

Sbjct: 121 VL 122 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6365> which encodes the amino acid 
sequence <SEQ ID 6366>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1004 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <; suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/122 (100%) , Positives = 122/122 (100%) 

Query: 1 MIQQETRLKVaDNSGaffiEILTIKVLGGSORKFANIGDVIVaSVKmTPGGAVKKGDVVKA 50 

MIQQBTRLKV6DNSG2UlEILTIKVLQGSGRKFANIGWIVaSVKQATPGGRVKKiGDVV^ 
Sbjct: 1 MIQQETRLKVAnKSGaUlEILTIKVIfiGSGRKFANIGDVIVASVKQATPGGAVKKGDVVKA 60 

Query: 61 VIVRTKTGARRPIXSSYIKFDniiaAVIlRmKTPRGTRIFGPVAREIiREGGY^ 120 

VIVRTKTSJiRRPDCSSYIKFDnKaAVIIRIJDKTPRGTRIFaPVaREIiREGGY^ 
Sbjct: 61 VrURTKTOARRPDGSYIKFDDNAAVIIRDDKTPRGTRIFGPVAHELREGGYMKIVSLAPE 120 

Query: 121 VL 122 
VL 

Sbjct: 121 VL 122 
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Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2059 

A DNA sequence (GBSx2171) was identified in S.agcdactiae <SEQ ID 6367> which encodes the amino 
acid sequence <SEQ ID 6368>. Analysis of this protein sequence reveals the following: 

D N- terminal signal sequence 



Pinal Results 

bacterial cytqplasm Certainty=0. 3415 (Affirmative) < succ; 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 
Sbjot: 

Query: 61 RIMETRPLSATKRFRLVEWEKAVII 86 

RIMETRPLSATKRFRLVEWE+AVI I 
Sbjct: 61 RIMETRPLSATKRPRLVEWEEAVII 86 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6369> which encodes the amino acid 
sequence <SEQ ID 6370>. Analysis of this protein sequence reveals the following: 

J-terminal signal sequence 



Final Results 

bacterial cytoplasm CertaintyasO. 3415 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/86 (100%) , Positives = 86/86 (100%) 

40 MERNQRKTLYGRVVSDKmKTIOWVETKRNHPVYGKRINYSKKVKaHDENNVaK^ 

Query: 61 RIMETRPLSATKRPRLVEWEKAVII 86 
RIMETRPLSRTKRFRLVBWEKAVII 
45 Sbjct: 61 RIMETRPLSATKRFRLVEWEKAVII 86 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2060 

50 A DNA sequence (GBSx2172) was identified m S.agalactiae <SEQ ID 6371> which encodes the amino 
acid sequence <SEQ ID 6372>. Analysis of this protein sequence reveals the following: 

■terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 4329 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKLQBIKDFVKELRGLSQEELaKKEimLKKELFDLRFQaAAGQLEKTAEIJJE^^ 60 

MKL E+K+FVKEIiRGLSQEEIiAK+ENELKKELF+LRFC3aA GQIjE+TARL EVKKQIAR+ 
Sbjct: 1 MKUSIEVKEFVKEIJlGLSC3EErAKRENELKKELFEIJlPQAATGQLEQT^ 60 

Query: 61 KTVQSEMK 68 

KTVQSE K 
Sbjct: 61 KTVQSERK 68 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 

vaccines or diagnostics. 

Example 2061 

A DNA sequence (GBSx2174) was identified in S.agalactiae <SEQ ID 6373> which encodes the amino 
acid sequence <SEQ ID 6374>. This protein is predicted to be RpL16 (rplP). Analysis of this protein 
sequence reveals the following; 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 4574 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology witii the following sequences in the GENPEPT database. 

>GP:AAD332S3 GB:AF126059 RpLlS [Streptococcus pneumoniae] 
Identities = 135/137 (98%) , Positives = 137/137 (99%) 

Query: 1 MLVPKRVKHRREFRGKMRGEAKGGKEVSFGEYGLQATTSHWITNRQIEAARIAMTRYMKR 60 

MLVPKRVKHRREFRGKMRGEAKGGKEV+FGEYGLQATTSHWITNRQIEAARIAMTRYMKR 
Sbjct: 1 MLVPKRVKHRREFRGKMRGEAKGGKEVAFGEYGLQATTSHWITNRQIEAARIAMTRyMKR 60 

Query: 61 GGKVWIKIFPHKSYTAKAlGVRMGSGKGAPEGWVAPVKRGKVMFEIAGVSEEVAEEAIiRL 120 

SQKVWIKIFPHKSYTRKAIGVRMGSGKXa^BGWVAPVKEGKVMFEIAGVSBB+AREIiLRL 
Sbjct: 61 GGKVWIKIFPHKSYTRKAIGVRMGSGRQAPEGWVAPVKRGKVMFEIAGVSEEIARERLRL 120 

Query: 121 ASHKLPVKCKFVKREAE 137 

aSHKLPVKCKFVKREIRE 
Sbjct: 121 ASHKLPVKCKFVKREAE 137 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6375> which encodes the amino acid 
sequence <SEQ ID 6376>. Analysis of this protein sequence reveals the following: 

3 N-tertdinal signal sequence 

- Pinal Results 

bacterial cytoplasm Certainty=0. 4574 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 136/137 (99%), Positives = 137/137 (99%) 



Query: 1 MLVPigWKHRREFRGKMRGEAKGGKEVSFGEYGLQATTSH'/IIITOQIEAARIAMTRYMKR 60 

MLVPKRVIOffiEEFRGKMRGESUKGGKEVSFGEXGLQATTSHWITlffiQIEAJiRIMTRTOK^ 
Sbjct: 1 MLVPKRVKHRREFRGKMRGESaCGGKEVSFGEYGLQRTTSHWITNRQlEAWlIAMTRYMKR 60 

Query: 61 GGKVWIKIFPHKSYTAKaiGVRMGSGKGAPESWVaPVKRGKVMFEIAGVSEEVARESU^ 120 

GGK^WIKIFPHKSYTaKAlGVBMGSGKSaPEEWmPVKRGKV^^ 
Sbjct: 61 GGroraiKlFPHKSYTAKAIG\7RMGSGKGM>E(3WV2iPVKRSK™FElJ^ 120 

Query: 121 ASHKLFVKCKFVKREftB 137 

ASHKLPVKCKFVKREftE 
Sbjct: 121 ASHKLPVKCKFVKREAE 137 

Based on fihis analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 



Example 2062 

A DNA sequence (GBSx2175) was identified in S.agalactiae <SEQ ID 6377> which encodes the amino 
acid sequence <SEQ ID 6378>. Analysis of this protein sequence reveals the following: 

20 Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3758 (Affirmative) < suco 

25 bacterial membrane — Certainty=0 . POOO (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clesu:) < suco 



The protein has homology with the followiiig sequences in the GENPEPT database. 

30 







Sbjct: 






70 


Sbjct: 


61 


Query: 


130 


Sbjct: 


121 


Query: 


190 


Sbjct: 


181 



MRVGIIRDVJDAKWyAEKEYAD'XI.HEDLAIRKPINKELADASVSTIEIERAVNKVIVSLHT 69 
MRVGIlRDWDAKWyAEKEYADYLHEDIiAIRKF+ KELADA+VSTIEIERAVNKV VSLHT 
MRVGiaRDWDAKWyAEKEYADYLHEDIiAlRKFVQKELADAAVSTIEIERAVNKVNVSLHT 60 



AKPGMVIGKGGflNVDALR +LNKIiTGKQVHINIIEIKQPDLiaAHLVGE lARQLEQRVAF 



RRflQKQAIQR MRASaKGIKTQVSGRLKIGftDIARAEGYSEGTVPIjHTLRADIDYAWEEAD 



TTYGKLGVKVWIYRSEVLPARKNTiaSGK 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6379> which encodes the amino acid 

sequence <SEQ ID 6380>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terrainal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3758 (Affirmative) < suco 

55 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=D . 0000 (Mot Clear) < suco 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



wo 02/34771 



PCT/GBOl/04789 



-2328- 

Example2063 

A DNA sequence (GBSx2176) was identified in S.agalactiae <SEQ ID 6381> which encodes the amino 
acid sequence <SEQ ID 6382>. This protein is predicted to be SOS ribosomal protein L22 (rplV). Analysis 
of this protein sequence reveals the following: 

5 Possible site: 36 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2704 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AMi33279 C3B:AP126061 RpL22 [Streptococcus pneumoniae] 
15 Identities = 99/114 (86%), Positives = 106/114 (92%) 

Query: 1 MREITSAKAMARTVRVSPRKTRLVLDIilRGKNVADAIAILKFTPNKAARVIEKTIiNSAIA 60 

MAEITSAKAMARTVRVSPRK+RLVLD IRGK+VADAIAIL FTPHKAA +1 K LNSA+A 
Sbjct: 1 MAEITSAKA^laRTVRVSPRKSRLVLDNIRGKSVaDAIAILTFTPNKaAEIILKVLNSAVA 60 

20 

Query: 61 NAENNFGIiEKaNLWSETEAflEGPTMKRFRPEAKGSaSPIIIKRT^^ 114 

NAENNPGL+KANLVVSE PANEGPTMKRFRPRAKGSASPINKRT H+TV V+EK 
Sbjct: 61 NRENNFGLDKaNDWSBaFANEGPTMKRFRPRAKGSASPINKRTA^ 114 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 6383> which encodes the amino acid 
sequence <SEQ ID 6384>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm — Certainty=0. 2794 (Affirmative) < suco 

bacterial membrane — CertaintysQ . 0000 (Not Clear) < suco 

bacterial outside — CertaintyaO . 0000 (Hot Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/114 (99%) , Positives = 113/114 (99%) 

Query: 1 MaBITSAKAMARTVRVSPRKTRLVICLIRGKNVJmiAII^TENKaaRVIEKTEM 60 
MABITSAKAMftRTVRVSPRKTRLVLDLIRGK VADAIAILKFTPNKaARVIEKTLHSAIA 
40 Sbjct: 1 MAEITSAKAMARTORVSPRKTRLVIiDLIRGKKVAnAIAILKFTPNKAARVIEKTLNSAIA 60 

Query: 61 NftBNNFGLEKftNLVVSETPANEGPTMKRERPRAKQSASPINKRTTHVTVVVSEK 114 

NAENNFGIjBKaNLVVSETFAHEGPTMKRFRPRAKGSASPIKKRTTHVTVVVSEK 
Sbjct: 61 imEtTOFGLEKftNLVVSETPANEGPTMKRFRPRAKBSASPINKRTrHVT^ 114 

45 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vacpines or diagnostics. 

Example 2064 

A DNA sequence (GBSx2177) was identified in S.agalactiae <SEQ ID 6385> which encodes the amino 
50 acid sequence <SEQ ID 6386>. This protem is predicted to be 308 ribosomal protein S19 (rpsS). Analysis 
of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 
55 Final Results 
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bacterial cytoplasm Certaiiity=0 .2991 (Affii 

. bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiiityi=0. 0000 (Not Clear) < suco 

The protein is similar to ribosomal protein S19 from S.pneumoniae. 

A related DNA sequence was identified ia S.pyogenes <SEQ ID 6387> which encodes the amino acid 
sequence <SEQ ID 6388>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintj^O. 3319 (Affirmative) < suco 

bacterial menibrane Certainty»>0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/92 (100%) , Positives = 92/92 (100%) 

Query: 1 MGRSLKKGPFVDEHLMKKVEAQaNDEKKKVIKTWSRRSTIFPSFIGYTIAVYDGRKHVPV SO 

MGRSLKKGPFVDEHimKVEAQm)EKKIWIKTWSRRSTIFPSFIGYTIAVYDGRKHVPV 
Sbjct: 19 MGRSLKKGPFVDEHLMKKVEAQRNDEKKKVIKTWSRRSTIPPSFIGYTIAVYDGRKHVPV 78 

Query: SI YIQEDMVGHKLGEFAPTRTYKGHAADDKKTRR 92 

YIQEDMVGHKLGEFAPTRTYKGHAADDKKTRR 
Sbjct: 79 YIQEDMVGHKLGEFAPTRTYKGHAADDKETRR 110 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefbl antigens for 
vaccines or diagnostics. 

Example 2065 

A DNA sequence (GBSx2178) was identified in S.agalactiae <SEQ ID 6389> which encodes the amino 
acid sequence <SEQ ID 6390>. This protein is predicted to be L2 (rplB). Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45959 GB:U43929 L2 [Baoilllie subtilis] 
Identities = 208/277 (75%) , Positives = 239/277 (86%) 

Query: 1 MGIKVYKPTTNGRHNMTSLDFAEITTOT'PEKSLLVSLKNKftGiamNGRITVra 60 

M IK YKP++NGRR MT+ DFAEITT+ PEKSLL L K GRNN G++TVRHQGGGHKR 
Sbjctt 1 MAIKKYKPSSNGREGMTTSDFAEITTDKPEKSIiI^LHKEGGRHNQGKIi 60 

CJuery: 61 HYRLIDFKRNKDCJVEAVVKTIEYDPNRTANIALVHyrDGVKAYIIAPKiGtEVGQRIISGP 120 ■ 

YR+IDFKR+KDG+ V T+EYDPNR+ANIAL++Y DG K YIIAPKjG++VG ++SGP 
Sbjct: 61 QYRVIDFKRDKDGIPGRVATVEYDPNRSANIALINYAIXSEKRYIIJ^KjGIQVGTEVMSGP 120 

Query: 121 ERDIK7GNRLPI^IPVGWIHNIELQPGKGREIaIRR3V3RSAQVLGQEGKX\rt.VHliQSGE 180 

EADIKVGNALPI. NIPVGTV+HNIEL+PGKG +L+R+AG SAQVLG+EGKYVLVEL SGB 
Sbjct: 121 ERDIKyGNRnPIiINIPVGTV\niNIEi:)KPGKGGQLVRSA6TSAQVLGKEGKX\rLVKIJISGE 180 

Query: 181 VRMILGTCRATIGTVGNEQQSLVNIGKAGRHRMKGVRPTVRGSVMNPNDHPHGGGEGKRP 240 
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Sbjct: 181 VEMILSACRASIGQVGlTOQHELINIGKa.GRSRWKGIRPTVRGSWINPNDHPHGGGEGRAP 240 

■ Query: 241 VGRKlVPSTPVreKPJUIKSLKTRNKKaKSDKLIVRRRNQK 277 
5 +GRK+P +PWGKP LG KTR KK KSDK IVRRR K 

Sbjct: 241 IGRKSPMSPWGKPTLGFKTRKKKNKSDKFIVRRRKNK 277 

A related DNA sequence was identified in S.pyogenes <SEQ ID 639 1> which encodes the amino acid 

sequence <SEQ ID 6392>. Analysis of this protein sequence reveals the following: 

10 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2560 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 264/277 (95%) , Positives = 276/277 (99%) 

20 

Query: 1 MGIKVYKPTTOSRRimTSLDFJffilTTOTPEKSLLVSLKNKAGRHira 60 

+GlKVYKPTTtK3RRimTSIJ3FAEITT+TPEKSLLVSLK+KAGRNM!IGRITWHQGGGHKR 
Sbjct: 1 VGIKVYKPTTNGRKlilMTSLDFAEITTSTPEKSLLVSLKSKAGRNNNGRITVRHQGGGHKR 60 

25 Query: 61 HYRLIDFKENKDGVEAWKTIEYDPNRTAKIALVHYTDGVKAYILAPKGLEVGQRIISGP 120 

HYRLIDFKRNKDGVEAVVKTIEYDPNRTANIALVHYTDGVKAYI+APKGLEVGQRI+SGP 
Sbjct; 51 HYRLIDFKRNKDGVEAWKTIEYDPNRTANIALVHYTDGVKAYIIAPKGLBVGQRIVSGP 120 

Query: 121 EftDIKVGNALPLflNIPVGTVIHNIELQPGKGAELIRAAGASAQVLGQEGimrLVRLQSGE 1,80 
30 +ADIKVGNMiPL?SNIPVGTV+HNIEL+PGKG EL+RftAGASAQVLGQEGKYVIiVRLQSGE 

Sbjct: 121 rfflDIKVGNaLPLaNIPVGTVVHNIELKPGK(MEL\nJAaGaSAQVLGQBGK^ 180 

Query: 181 VRMIIXSTCRaTIGTVGNEQQSLVNIGKAGRireWKGVRPTVRGSVMNENDHPHGGGEGKRP 240 
VRMILGTCRATIGTVGISffiQQSLWIGKAGR+RWKG+RPTVRGSVMNEKlDHPHGGGEGK^ 
35 Sbjct: 181 VRMII/STCRATIGTV^GNEQQSLWIGKAGRSRWKBIRPTVRGSVMNENDHPHGGGEGIOiP 240 

Query; 241 VGRKAPSTPMGKPALGLKTRNKKAKSDKLIVRRRNQK 277 

VGRKAPSTPWGKPALGLKTRNKKAKSDKLIVRRRN+K 
Sbjct: 241 VGRKAPSTPWGKPALGLKTRNKKAKSDKLIVRRRNEK 277 

40 , 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2066 

A DNA sequence (GBSx2180) was identified in S.agalactiae <SEQ ID 6393> which encodes the amino 
45 acid sequence <SEQ ID 6394>. This protein is predicted to be SOS ribosomal protein L23 (rplW). Analysis 
of this protein sequence reveals the following: 
Possible site: 44 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certaintyi=0 .1669 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear)' < suco 

bacterial outside — Certainty^O. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaB0385B GB:AP001507 ribosomal protein L23 [Bacillus halodurans] 
Identities = 56/92 (60%) , Positives = 67/92 (71%) , Gaps = 1/92 (1%) 
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Query: 2 NLYDVIKKPVITEKS^WAIEAGKyTFEVDTRAHKLLIKQA^/EAAET)GVKVASVJm7T^ 61 

N DVIK+PVITE+S + KYTFEVD RA+K IK A+E FD VKVA VNT+ K 
Sbjct: 3 NARDVIKRPVITERSTBVMGDKKYTFEVDVRflTOCTQIKDAIEEIED-VKVAKVNTM^ 61 

Query: 62 KAKRVGRYTGFTSRrKKAIITLTADSKAIELP 93 

K KR GRYTGFT++ KKAI+TLT DSK ++ F 
Sbjct: 62 KPKRFGRYTGFTARRKKAIVTLTPDSKELDFF 93 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6395> which encodes the amino acid 
sequence <SEQ ID 6396>. Analysis of liiis protein sequence reveals the following: 

> N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1617 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 95/98 (97%) , Positives = 97/98 (98%) 

Query: 1 MNLYDVIKKPVITEKSMWALBRGKYTFEVDTRSHKLLIKQAVEAAFDGVKWRSVN^^ 60 

MrajYDYIKKFVITEKSMH-aLEAGKrrFEVDTRAHKLLIKQAVEAaFDGVKVASVNTV VK 
Sbjct: 1 MNLYDVIKKPVITEKSMIMjEAGKYTFEVDITUUlKLLIKQAVEAAFDGVKVaSVNTVN^ 60 

Query: 61 PKAKRVGRYTGFTSKTKKRIITLTADSKAIELEftAEftE 98 

PKAKRVGRYTGFTSKTtCKAIITLTADSKAIEIiEAflEfiE 
Sbjct: 61 PKAKRVGRVTGFTSKTKKailTLTADSKAIELEAaEAE 98 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2067 

A DNA sequence (GBSx2181) was identified in S.agalactiae <SEQ ID 6397> which encodes the amino 
acid sequence <SEQ ID 6398>, This protein is predicted to be SOS ribosomal protein L4 (rplD). Analysis of 
this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Likelihood = -1.54 Transmembrane 140 - 156 ( 139 - 156) 



40 ■ Final Results 

bacterial membrane Certainty=0 . 1617 (Affirmative) <; succ 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences m the GENPEPT database. 



Query: 1 MANVKLFDQTGKEVSSVEENEAIFGIEPNESWFDWISQRASLRQGTHAVKNRSAVSGG 60 

M V L++Q G +EIjN ++FGISPNESWFD -■+ QRASLRQGTH VKNRS V GG 

Sbjct: 1 MPK^ALYNQNGSTAGDIEmASVFGIEPISESWFDAILMQRASLRQGTHKVKNRSEVRGG 60 

Query: 61 GRKPVTOQKGTGRARQGSIRSPQWRGGGWTOPTPRSYGYKLPQKURELALKSVYSAKVAE 120 

GRKPWRQKGTGRARQGSIRSPQWRGGGWPGPTPRSY YKLP+KVRRLA+KSV S+KV + 
Sbjct: 61 GRKPfTOQKBTGRARQGSIRSPQWRGGGVVPGPTPRSYSYKLPKKVRRIAIKSVLSSKVID 120 



Query: 121 DKIT/AVENLSFAAPK3MFASVLSaLSIDSKVLVILEEGNEFAaiiSJU?NLPlWT^ 180 
+ + +E+L+ KT E A++L LS++ K !.++ + NE AISAKN+P VTV A 
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Sbjct: 121 lWIIVLEDLTIiDTAKTKEMAAILKGLSVEKKMiIVTADJiNEftV?U^SaRNIPGVTVVEaNG 180 



Query: 181 ASVLDIVNADKLLVTKEAISTIEGVLA 207 
H-VLD+VN +iajL+TK A+ +B VLR 
5 Sbjct: 181 imrLDVTOHBKLLITKRAVEKVEEVIA 207 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6399> which encodes the amino acid 
seqtience <SEQ ID 6400>. Analysis of this protein sequence reveals the following: 

3 terminal signal sequence 



- Final Results 

bacterial cytqplasm Certainty=0. 2544 (Affirmative) > 

bacterial membrane - — Certainty=0 . 0000 (Not Clear) < i 

bacterial outside --- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/207 (96%) , Positives = 203/207 (97%) 

MiUJVKLFDQTGKEVSSVEIjNEAIFGIBENESWFDWISQRASI^ 6 0 

MANVKIlFDQTC3KE^SSVEUI+AIFGIEP^ffiSVVFDWISQRASLRCX3TE^AVK^ 
MfiNVKLFD(3TGKEVSSVELNDAIFGIEPNESWPDWISQRASLRQGTHAVKNRSAVSGG 6 0 



DKFVAVE LSFAAPKTAEFA VLSALSID+KVIV ++EEGNEFAALSARNLPNVTVATA T 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


sed on this 



ASVLDIVNADKLLVTKEaiSTIE VIA 



vaccmes or diagnostics. 
Example 2068 

A DNA sequence (GBSx2183) was identified in S.agalactiae <SEQ ID 6401> which encodes the amino 
acid sequence <SEQ ID 6402>. This protein is predicted to be SOS ribosomal protein L3 (rplC). Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytqplasm Certainty=0. 2090 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



50 The protein has homology with the foUowmg sequences m the GENPEPT database, 

>GP:AAC45956 GB:U43929 1.3 [Bacillus subtilis] 
Identities = 157/208 (75%) , Positives = 180/208 (86%) , Gaps = 2/208 (0%) 

Query: 1 MTKGILGKKVGWrQIFTESGEPIPVTVIEATPNVVI.QVKTVETDGYEAVQVaPDDKREVL 60 
55 MTKGILG+K+GMTQ+F E+G+ IPVTVEEA PNWLQ KT E DGYEA+Q+GFDDKRE L 

Sbjct: 1 MTKGILGRKIGMTQVFAENGDLIPVTOIEAAPNVVIKJKKTAENDGyE&IQLGPDDKEEKL 60 



Query: 61 SNKPAKGHVaKaNTAPKRFIRKFKNIE--GIiEVGAELSVEQFEaGDVVDVTGTSKGKGFQ 118 
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SNKP KGHVAKR TAPKRF++E + +E EVG E+ VE F AG++VDVTG SKGKGFQ 
Sbjct: 61 SNKPEKSHVaKaETAPKRPVKELRGVEMDAYEVGQEVK^/EIPSAGEITOVTGVSKGK^ 120 

Query: 119 GVIKRHGQSRGHyffiHGSRYHRRPGSMGPVAPNRVFKNKRLAGRMGGNRVTVQMLEIVQ^ 178 

G IKRHGQSRGPM+HGSRYHRRPGSMGPV ENRVFK K L GRMGG ++TVQtILEIV+V 
Sbjct: 121 GAIKRHGQSRGPMSHGSRYHRRPGSMGPVDPNRTOKQKLLPGIMGGEQITVQNLEIVKVD 180 

Query: 179 PEKNWLIKGNVPGRKKSLITIKSAVECA. 206 

E+lH-+IjIKCaroPGRKKSLITl-KSKVK+ 
Sbjct: 181 AERMLLLIKGNVPGRKKSLITVKSAVKS 208 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6403> which encodes the amino acid 
sequence <SEQ ID 6404>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2090 (Affirmative) < suco 

bacterial metnbrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

■Identities = 205/208 (98%) , Positives = 207/208 (98%) 

Query: 1 MTKGILGKKVGMTQIFTESGEFIPVTVIEATPNVVLQVKTVETDGYEAVQVGFDDKREVL 60 

MTKGILGKKVGMTQIFTESGEFIPVTVIEATPNVVLQVKTVETDGYEAVQVGFDDKREVL 
Sbjct: 1 MTKGII^KKVGMTQIFTESGEFIPVTVIEATPNVVLQVKTVETDGYEAVQVGFDDKREVL 60 

Query: 61 SNKPAKBHVAKANTAPKRFIREFKNIEGLEVGaELSVEQFEAGDWDVTGTSKGKGFQGV 120 

SNKPAKGHVRKM3TAPKRFIREFKN1EGLEVGRELSVEQFERGDVVDVTG SKGKGFQGV 
Sbjct: 61 SNKPAKGHVAKAIin'APiaiFIREFKNIEeLEVGftELSVEQFEaGDVVDVTGISKBKGFQCT 120 

Query: 121 IKRHGQSRGPMaHGSRYHRRPGSMGPVAPNRVFKNKRIAGRMGGNRVTVQNLEIVQVIPE 180. 

IKRHGQSRGPMAHGSRYHRRPGSMGPVRPNRWKNKRIiAGRMGGNHVWQtlLEIVUVIPE 
Sbjct: 121 IKRHGQSRGPlffiHGSRYHRRPGSMGPVAPNRVFKNKRIAGRMGGraW^ 180 

Query: 181 RNVVLIKGNVPGAKKSLITIKSAVKAAK 208 

KW+L+RGaWPGAKKSLITIKSAVKAAK 
Sbjct: 181 RNVILVKGNVPGAKKSLITIKSAVKAAK 208 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2069 

A DNA sequence (GBSx2184) was identified in S.agalactiae <SEQ ID 6405> which encodes the amino 
acid sequence <SEQ ID 6406>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have an uncleavable N-tenti signal seq 

INTEGRAL Likelihood = -0.43 Transmembrane 5 - 21 ( 5-21) 

Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < auco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2070 

A DNA sequence (GBSx2185) was identified in S.agalactiae <SEQ ID 6407> which encodes the amino 
5 acid sequence <SEQ ID 6408>. This protein is predicted to be 30S ribosomal protein SIO (ipsJ). Analysis 
of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3160 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:A2ffl46363 GB:Ii29637 SIO ribosomal protein [Streptococcus mutans] 
Identities = 98/102 (96%), Positives = 102/102 (99%) 

Query: 1 MSMKKIRIIlLKRYEHRTIiDTAREKIVETATRTtaTVMPVPLPTER^ 60 
20 MaNKKIRIRLKRYEHRTLDTAaEKIVETATRTGA+VaGPVPLPTERSLTO^ 

Sbjot: 1 MftinCKIRIRLK2VmiRTLDTAaEKI\ffiTATRTGASVM3PTOLPTERSLYTVIRAra 60 

Query: 61 SREQFEMRTHKRLVDIINPTQKTVDftLMKLDIiPSGVNVEIKL 102 
SREQPEMRTHKRL+DI+NPTQKTVnaLMKLDLPSGVNVEIKL 
25 Sbjct: 61 SREQPEMRTHKRLIDIVNPTQKTVnaLMKLDLPSGVNVEIKL 102 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6409> which encodes the amino acid 

sequence <SEQ ID 6410>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3160 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
35 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 102/102 (100%), Positives = 102/102 (100%) 

40 Query: 1 MMIKKIRIRLKAYEHRTLDTAAEKIVETATRTGATVAGPVPLPTERSLYTIIRATHKYKD 60 

MANKKIRIRLKAYEHRTLDTAaEKIVETATRTGATVACPVPLPTERSLYTI IRATHKYKD 
Sbjct: 1 MANKKIRIRLKAYEHRTLDIAlffiKIVETATRTGRTVAGPVPLPTEKSLYTIIRATHKara 60 

Query: 61 SREQPEIKTHKRLVDIINPTQKTVDftLMKLDLPSGVMVEIKL 102 
45 SEEQFEMRTHKELVDIINPTQKTVDALMKIjDLPSGVNVEIKL 

Sbjct: 61 SREQFEMRTHKRLVDIINPTQKTVDALMKLDIiPSGVNVEIKL 102 

Based on this analysis, it was predicted lliat this protein and its epitopes, could be usefial antigens for 
vaccines or diagnostics. 

50 Example 2071 

A DNA sequence (GBSx2186) was identified in S.agalactiae <SEQ ID 641 1> which encodes fee amino 
acid sequence <SEQ ID 6412>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
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>» Seems to have no N-terminal signal sequence 

Final Results " ■ 

bacterial cytoplasm Certainty=0. 2538 (Affirmative) < suco 

5 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccmes or diagnostics. 



Example 2072 

A DNA sequence (GBSx2187) was identified in S.agalactiae <SBQ ID 6413> which 
acid sequence <SEQ ID 6414>. Analysis of this protein sequence reveals the following: 



encodes the ammo 



INTEGRAL 
INTEGRAL 
XNTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



331 - 357 



88 - 104 ( 

304 - 320 ( ; 

185 - 201 ( : 

338 - 354 ( 

240 - 256 ( 

383 - 399 ( 

49 - 65 ( 48 ■ 

127 - 143 ( 121 - 

159 - 175 ( 159 - 177) 



■-- Certainty=0. 5564 (Affirmative) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaB06655 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 132/423 (31%) , Positives = 210/423 (49%) , Gaps = 16/423 (3%) 

Query: 7 IIQLAIPAMIENILQMLMGWDNYLVAQLGWAVSGVSVANNIITIYQAIF--IALGASI 64 

+ L P IE +L MLMG D +4+Q AV+ V V+N 1+ + +F +A G SI 
Sbjct: 11 LFALTWPIFIEILLHMLMGNADTLMLSQYSDDAVAAVGVSNQILAVIIVMFGFVATGTSI 70 

Query: 65 ASLIAKSLAGSKKDDAISVCSQAIFLTLLIGAVIiGIISIVFGQTFFKLLGTTKSVAQVGG 124 

L+A+ L ++++A V +1 L+ G VI.G++ I FG K + S+ Q 
Sbjct: 71 --LVAQHLGAKERENASKmVVSIGflNLIFGIVLGLLLIAFGPPILKAMQLDDSLLQEAT 128 

Query: 125 LYIAIVGGGVOTLGMLTTLGSFIJlVQGQPRLPMYVSIFVNFIJaAVLSGFAIFKVIR Y 180 

LYL IVGG V ++ T G+ LR, + MYV+I +N LN + + IF 
Sbjct: 129 LYLQIVGGFSWQSLIMTAlGAILRSHSFTKDVMYVTIGMNILNVIGNYLFIFGPPGIPVL 188 

Query: 181 GLVGVAVSTLIAELIGIGILAKyL PIKKIIKRMTWKISAQIWNLALPSA6ER 232 

G+ GVA+ST+++R 16+ +4-A L P ++KR + + +PSAGE+ 

Sbjct: 189 GVTGVALSTWSRTIGLFVIAILLYKRIRGELPFAYLLKRFPRVELRNLLKIGIPSAGEQ 248 

■Query: 233 LMMRAGDWIVAIWQLGTNWAGNAIGETLTQENYMPGLGIATATIILTAKYVGQKNRE 292 

L A +VI + +GT + +LF+++I TIL V6K + 

Sbjct: 249 LSYNASQLVITYFIAMMGTEALTTKVYTQNLMMFVFLFAVAIGQGTQILIGHQVGAKQIQ 308 

Query: 293 SIEETIQSSYYIGLVLMILISSFMLLAGKPLTQLFTNMPSAIKGSLIVILLSFVGVPATI 352 

+ S +1 + + + ++ PL +ET+NP + ++LL+ + P 

Sbjct: 309 AAYVRCFRSLWIAMTVSVSMAWFFAPSTPLLGIFTDNPDILSLGTTLLLLTIILEPGRA 368 
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Query: 353 GTLWTAAWQGLGNAKLPFYTTTIGMWLlRWniiGYLIXSIVPEICI^VWiyR 412 

LV+++G+KPY +MWIV + YI1LG+ LGL+GVW+A IAD FR 
Sbjct: 369 ClSILWISSLRftAGDVK5'PVYLaiVSMWGIAVPIAYIJ;CLPLGLGLIGVWIAFIJU3EWFR^ 428 



A related DNA sequence was identified in S.pyogenes <SEQ E) 641 5> which encodes the amino acid 
sequence <SEQ ID 641 6>. Analysis of this protein sequence reveals the following: 



Possible site: 48 

eems to have no N- terminal 
Likelihood = -5 
Likelihood = -4, 
Likelihood = -3 
Likelihood = -3 
Likelihood = -3, 
Likelihood = -3. 
Likelihood = -2 
Likelihood = -2 
Likelihood = -2, 
Ukelihood = -2 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



signal sequence 

; Transmembrane 

) Transmembrane 

I Transmembxcine 

I Transmembrane 

? Transmembrane 

I Transmembrane 

L Transmembrane 



50 

- Final Results 

bacterial membrane ■ 
bacterial outside ■ 
bacterial cytoplasm • 



- 105 ( 85 - 108 

- 321 ( 302 - 322 

- 177 C 161 - 180 

- 208 ( 139 - 208 

- 145 ( 128 - 151 

- 258 ( 240 - 258 

- 394 ( 377 - 394 

- 355 ( 338 - 358: 

- 74 ( 58 - 75 

- 48 ( 32 - 49 



-- Certainty=0. 3102 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0 . 0000 (Not Clear) < i 



Transmembrane 



30 The protein has homology with the following sequences in the databases: 

conserved protein [Bacillus halodurans] 
Positives = 214/435 (48%) , Gaps = 14/435 (3%) 

CJuery: 9 IPSLBLPSMIBNILQML^O^VDNyLVAQIGLVAVSGVSIANNIISIyQSLPIALGAAVSS 68 

+F+L P IE +L MLMG D +++Q AV+ V ■(-+» +F + S 

Sbjct: 11 LFALTWPIFIEILLHMU03RDTimiSQySDDA\ffiAVC3VSNQXLRVIIVMFGF^ 70 

Query: 69 LIARSIGElSWQNKjQIJlIYMAGVLQVTLLLSVGLGLLSVaGHHQVLEWI^^ 128 

L+A+ -K3 + + L+ + LGLL +A +L+ + + S+ Y 

Sbjct: 71 LVJiQHLGaKERENAGKUAWSIGANLIFGIVLGLLLIAFGPPILKaMQLDDSLLQEATLY 130 

Query: 129 LSIVGGMrVSIfiLLTSICaiVEAQCOTiaPMaVSLLINl7USIAIFS24LSIY VMGFGL 184 

L IVS3 V L+ + Gai+R+ + K M V++ +N+IiN I + L 1+ + G+ 
Sbjct: 131 LQIVGGFSWQSLIMTAGAILRSHSFTKDVMYVTIGMNILNVIGNYLFIFGPFGIPVLGV 190 

Query: 185 LGWRWATVLSRLVGVFLLCQF ---IPIKQVAKRLMRPLDKIIFDLSLPAaGERLM 236 

GVA +TV+SR +6+F++ +P + KR R + + + +P+AGE+L 

Sbjct: 191 TGVftLSTWSRTIGLFVIAlLLYKRIRGELPEAYLLKRFPRVELRNLLKIGIPSftGEQLS 250 

Query: 237 MRAGDVLIlGIVVRFGTTALAGKAIGEIl,TQFNYMPGLAmTATIlL\ffiRQLGGGKVTEI 296 

A ++I + GT AL + L F ++ H-A+ T 1L+ Q+G ++ 

Sbjct: 251 YNASQLVITYFIAlMGTERLTTKOTTgSlLMMFVFLFATrailGQGTQILIGHQVGAKQIQaA 310 

Query: 297 RYIIREAFILSTLMMLVMQM:.TYLLGPSLLPLFTQMTDAQRSftMlVLLFSLLGaPATAGT 356 

+ ++ + + M + + LL +FT N D +LL +++ P A 

Sbjct: 311 YVRCFRSLWIAMTVSVSMAWFFAFSTPLLGIFTDNPDILSLGTTLLLLTIILEPGRACW 370 

Query: 357 LWTAWQGLGKAXnPFYATTIGM-Ti^lRIGLGYVIGir^QYGIjIGVliWIATVLDNTSRWFI 416 

LV+ + GKPY +MWI + + Y++G+ GLIGVW+A + D R + 
Sbjct: 371 LWISSLRAAGDVKFPVYLAIVSMWGIAVPIAYLLGLPLGLGLIGVWIAFIADEWFRGLL 430 

Query: 417 LSKHFK--KYQEITF 429 

+ ++ K+QE++F 
Sbjct: 431 V 
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An alignment of the GAS and GBS proteins is shown helow. 

Identities = 219/418 (52%) , Positiveo = 31S/418 (75%) 

Query: 5 KEIIQIAIPMIENILQMLMC3VVDOTLVAQLGWftVSGVSVANNIITIYQAIFIALGftSI 64 

++I LA+P+MIENILQMLMG+VDNYLVAQ+G+VAVS6VS+ANNII+IYQ++FIALGA++ 
Sbjct: 7 RKIFSIM.PSMIENILQMriMGMVDNyLVAQIGLVAVS6VSiaNNIISIYQSLFIALGftAV 66 

Query: 65 ASIJJUCSLaGSKKDraiSVCSQAIFLTLLIGAVLGIISIVFGQTFFKLLGTTKSVRQV^ 124 

+SL+A+S+ +++++ + + +TLL+ LG++S+ + LG SV VGG 

Sbjct: 67 SSLIARSIGEMNQNKQUSIYimGVLQVTLLLSVGMLLSVAGHHQVLEJWIiC^ 126 

Query: 125 LYLAIVGGGWTLGMLTTLGSFUIVQGQPRLPMYVSIFVNFLNAVLSGFAIFEWRYGLVG 184 

YL+IVGG +V+LG+LT+LG+ +R QG P++PM VS+ +N I1NA+ S +1+ W +GL+G 
Sbjct: 127 QYLSIVGGMIVSLGIiLTSLGAIVRAQGYPKIPMQVSLLINOTJJAIFSALSIYVWGFGLLG 186 

Query: 185 VAVSTLIARLIGICIIAKYLPIKKIIKRMTWKISAQIWNLALPSAGERLMMRAGDVVIVA 244 

VA +T+++RL+G+ +L +++PIK++ KR+ + I++L+LP+AGERLMMRAGDV+I+ 
Sbjct: 187 VAWATVLSRLVGVFLLCX3FIPIKQVaiaUJ4RPLDKIIFDLSLPAAGERLMMRAGDVLIlG 246 

Query: 245 IWQLGTOTVRGKAIGETLTQFimiPGMIATATIILTAKYVGQK^ 304 

IW+ GT +AGNA.IGETLTQFNYMPGL +ATATIIL A+ +G I 1+ ++ + 

Sbjct: 247 IVTOFGTTALAGNaiGETLTQBTmiPGIAMATATIILVftRQLGGGKVTO 306 

Query: 305 GLVIWILISSFML]JiGKPLTQLFTNNPSAIRGSLIVir.IiSFVGVPATIGTLVYTAAWaGL 364 

++M+++ + L G L LFT N A + ++IV+L S +G PAT GTLVYTA WQGL 
Sbjct: 307 STLMMLVMGALTYLLGPSLLPLFTQNTDAQRSAMIVLLFSLLGAPATAGTLVYTAVWQGL 366 

Query: 365 GNMIiPFYTTTIGMMLIRVVI/3YLl/3IVFEIX3LLGVWMATIADNIFRWLPLKVHYHRY 422 

G AKLPFY TTIGMW+IR+ LGY++6+V++ GL+GVWMAT+ DN RW L H+ +Y 
Sbjct: 367 GKfiJKLPEYATTIGMWVIRIGLGYVIGVVWQYGLIGVWNIATVLDmSRWFILSKHFKKY 424 
Identities = 48/211 (22%) , Positives = 89/211 (41%) , Gaps = 29/211 (13%) 

Query: 213 MTWOSAQIWNIJUiPSAGERLMMRAGDWIVAIVVQLGTNVy^ 272 

M + +I++LftLPS E ++ +V +V Q+G V+G +1 + + 

Sbjct: 1 MIYimreRKIFSLBiPSMIENILQtMGMVIMraVftQIGLV^^ 60 

Query: 273 GIATATIILTAKYVGQKNRESIEETIQSSYYIGLVLMILISSFML L 318 

+ A L A+ +G+ N4- Q +Y G++ + L+ S L L 

Sbjct: 61 ALGAAVSSLIARSIGENNQNK QENYMAGVLQVTLLLSVGLGLLSVaGHHQVLEWL 115 

Query: 319 AGKSLTQLFIWPSAlKGSLIVILLSFVGVPATIGTLVYTAAWQGLGimKLPFYTTTIGM 378 

+ L +1 G +IV L G+ ++G +V + G K+P + + + 

Sbjct: 116 GAEASVTLVGGQYLSIVGGMIVSL GLLTSLGAIV RAQGYPKIPMQVSLL-I 165 

Query: 379 WLIRVVLGYLLGIVFELGLLGVWMATIAmi 409 

++ + L V+ GLLGV AT+ + 
Sbjct: 166 NVLKAIFSALSIYVWGPGLLGVaWATVLSRL 196 
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INTEGRAL Likelihood = -2.18 
PERIPHERAL Likelihood = 0.32 
modified ALOM score: 2.78 

*** Reasoning Step: 3 

Final Results 

bacterial membrane - 
bacterial outside - 



Transmembrane 



) - 46 -( 30 - 



-- CertaintytaO. 5564 (Affirmative) • 
-- Certainty=0 . 0000 (Not Clear) ■ 



bacterial cytoplasm — Cfertainty^O . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORP01629(313 - 1533 of 1878) 

EGftD|l65726|TM0815 (20 - 436 Of 464) conserved hypothetical protein {Thermotoga maritima} 
15 OMNliTM0815 conserved hypothetical protein GP|4981345|gb|AAD3S897.l|M;001748_13|AE001748 

conserved hypothetical protein {Thermotoga maritima) PIR|H72331 |H72331 conserved 
hypothetical protein - Thermotoga maritima (strain MSB8) 
%Match =13.9 

%Identity - 29.4 %Similarity =53.7 
20 Matches = 120 Mismatches = 183 Conservative Sub.s = 99 

48 78 108 138 168 198 228 258 

YK*RRDTGFRCYBmiKRFVRCPFT*GG'XESTKGRSNP*NGSTYLKY2Uara*RVSRFETIIKIRLF*NI*SEKETP*K^^ 

25 M 



HSLF]mPG**K!BIJITOYSKKIIQLAIPjyyiIENILQMLM(3VVIWn)Va^ 

IM = lhlll 11 = 1111 1= I - = 1 = 111 -I = =1 : = ll 

RYSLFKNYLPKEEVPEIRKELIKIi2iLPAMGENVLQMLFGMM)TAFLGHySMKAMSGVBLSNQVFWWQ^ 



528 558 588 609 639 669 699 729 

LLRKSLAGSKKDDAISVCSQAIFLTLLIGAVL---GIISIVFGQTFFiCLLGlTKSVAQVGC3LYLaiVGGGWTLGMLTTL 
35 , :| : |: ::|| | :| :M: j | | :| | || :: | : : :: : 

TIANAIGAGNRKAVRSLAVJNSVFLAIFTGVILTALTPLSDVLINIFPNLEGEIESSA- - - KEYLKVILSGSMGFSIMAVF 
100 110 120 130 140 150 



759 789 819 837 867 897 909 939 

40 GSFLRVQGQPRLPMYVSIFVNFLNAVLSGFAIF EWRYGLVGVAVSTLIARLIGICILA KYLPIKKIIKRM 

= 11 1 Mil:: Mil III |: I ||:|:::|::| || : : ::| : 

■ SAMLRGAGDTRTPMIVTGLTNFLNIFLDYAMIFGKFGFPEMGWGAAVATILSRFVGAGILTYVIFKREEFQLRKBLVPP 
170 180 190 200 210 220 230 

45 969 999 1029 1059 1089 1119 1149 1179 

TWKlSAQIWNLALPSAGERL^MRAGDWIVAIWQL(?IWVAGImIGETLTQE^>^MPGLGIATATlILTAK^^ 

I :1 : :| = l 1 == I == I- I Ih II = :::ll :||= 1 1 =1 1 1 = 1 

KWSSQKEILRVGFPTAIENFVFSTGVLMFiiNIIiLIflGRERYRGBRIGINVESIiSFMPAPGiSWRITTLVGRYNGMa^ 
250 260 270 280 290 300 310 

50 

1209 1239 1269 1299 1329 1359 1383 1413 

IEETIQSSYYIGLVIJVlILISSFiyiLLAGKPLTQLFTraqPSAIKGSLIVILLSFVGVPATIGTLVYT--JSAWQGLCa^^ 
: |: = = 1:: : : ::| :|| ::||=:| |: | : : :]. = I I =1 II I 

VLGVIRQGWILSLLFQVTVGIIIFLFPEPLIRIFTSDPQIIEISKLPV— KIIGLFQFFLAIDSTMNGALRGTGNTLPPM 
55 330 340 350 360 370 380 390 



1443 1473 1503 1533 1563 1593 1623 1653 

YTTTIGMWLIRVVLGYLLGIVFELGLLGVMyiATIADNIFRm^ 

I I =1 1= = === 1=11111 1= III 111 
IITFISIWTARLFVAFVMVKyFQLGLLG&WIQMIADIIFRSTLKLLFFLSGKWEKRAVLTRER\^ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2073 

A DNA sequence (GBSx2188) was identified in S.agalactiae <SEQ ID 6417> which encodes the amino 
acid sequence <SEQ ID 641 8>. Analysis of this protein sequence reveals the following: 

^-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 22 00 (Affirmative) < suco 

bacterial membrane Certainty=0. OOCO (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD05671 GB:AE001448 THREONINE STIMTHASE [Helicobacter pylori 
J99] 

Identities = 161/479 (33%) , Positives = 259/479 (53%) , Gaps = 17/479 (3%) 

Query: 14 KVTASQAILKGLADDGGLFTPITFPKVDLDFTKLKDASYQEVAKLVLSAFFDDFTEQELD 73 

K+ +A+L A GGIi+T F L++ SY E+ + V + + Ii 

Sbjct: 13 KIDFIEAVIjHEmPKMLYTLEHFET--LEWQDC]:/3MSYSELVEHVEBIOtt.EIPKK^ 70 

Query: 74 YCISQAra^KFDTTEI2^IVKIGDRYHL-ELFHGPTIAFKD^MSILPra^TTAAKRQGV 132 

+ + Y+ + API + +R + EL+HGP+4-AFKDMAL L L + A G 
Sbjct: 71 SALKR-YENPDNPKNPAPIFALNERLFVQELYHGPSLAFKDMALQPLASLFSNLAV— GK 127 

Query: 133 DimiVILTATSGDTGKAAMAGFADVPGTEIIViYPKNGVSYIQELQMITQAGQNTHWAI 192 

+ K ++L +TSGDTG A + G A +P ++ YPK+G S +Q+LQM+TQ N V + 
Sbjct: 128 NEKYLVLVSTSGDTQPATLEGLAGMEWFWCLYPKIXSTSLVQKLQMVTQNASNLIWFGV 107 

Query: 193 EGimJDAQTSVKEMBTOSIiRLKLSQQHMQLSSflNSNraiGRLVPQIVYYIYAYAQLVKSK 252 

G+FDDAQ ++K + + L + ++LS ANS+N GR+ QIVY+I+ + +L K 

Sbjct: 188 SGDFDnAQNRLKNLLKDDDFNEALKARQLKLSVANSVNFGRIAFQIVYHIWGFLELYKKS 247 

Query: 253 EISIGQPINFSVPTGNraNILAAYYASQIGLPVTKLICASNnNNVLTD 311 

1+ + I ++P+GKFGH h A+YA ++GL + K+ +N N+VL +F +T YD R 
Sbjct: 248 AINSKEKITI^IPS(OTFGNAIX38^AKKMGIJJIAKIKVVTNSND\rLREFIETGRY^^ 307 



Query: 371 VAGFATEQFVELDIKHLFDQYQYIEDPHTAVASAVYQRYQTETKDQTPAVIVSTASPyKF 430 

+++ 1+ ++ ++QY+ DPHTA A K ++ +TAS KF 

Sbjct: 366 SCASCSDEDCLKriQEVYAEHQYLIDPHTAT AUJASLKTHEKTLVSATASYEKF 419 

Query: 431 PCVVTKAIT-NKEEIQDFAAISIIiNDLSGVSLPKAVTDLQKaEVIHRTVVPTSNMRETV 488 

P A+ K+ D AA+ L + + + DL + + H+ V+ + ++ ++ 

Sbjct: 420 PKTTLIAUTOQKKNDDDKAALETLKNSYNTPDSQRICDLFERGIKHQEVLKIJ^ 478 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2074 

A DNA sequence (GBSx2189) was identified in S.agalactiae <SEQ ID 6419> which encodes tiie amino 
acid sequence <SEQ ID 6420>. Analysis of this protem sequence reveals the following: 

3 N-terminal signal sequence 
- Final Results 
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bacterial cytoplasm Cerzalnty=0 . 3153 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9279> which encodes amino acid sequence <SEQ ID 9280> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF4d975 GB:AE002410 alcohol dehydrogenase, prppanol -preferring 
[Neisseria meningitidis MC58] 
Identities = 202/282 (71%) , Positives = 228/282 (80%) , Gaps = 1/282 (0%) 



GM+ + IV+ADYAVKVPEGLDPAQASSITCAGVTTYK&IK +G PGQWIA+YGAGGLGN 



TAVS AFN A++ VRAGG WA+GLP E M+LSI + VLDGI WGSLVGTRKDLEEAF 



FGAEGIiWP V+ +D AP +F EM G I GR V+D K 
QFGSSGLVVPKVQLEAIiDEAPAIFQEMREGKITGPMVIDMKK 340 

A related DNA sequence was identified in S.pyogenes <SEQ ID 642 1> which encodes the amino acid 
sequence <SEQ ID 6422>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N- terminal signal sequence 



(Juery: 


1 


Sbjct: 


60 




61 


Sbjct: 


120 


Query: 
Sbjct: 


121 




181 


Sbjct: 


239 




241 


Sbjct: 


299 



Final Results 

bacterial cytoplasm Certainty=0 .2356 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 263/280 (93%) , Positives = 273/280 (96%) 

MGHEGIGIVEEIGEGVTSLRVGDRVSIAWFFEGCGHCEYCTTGRETLCRSVKKRGYSVDG 6 0 

+GHEGIGIVEEIGEGVTSL+VGDRVSmWFEGCGHCEYCTTGRETLCRSVKNAGYSVDG 
LGHEGIGIYEEIGEGVTSLKVGDRVSIAWFFEGCGHCEYCTTGRETLCRSVKNAGYSVDG 13 ! 

GMSEYAIVTADYAVKVPEGLDPAQASSITCAGVTTYKAIKEAGAAPGQWIAVYGAGGLGN 12 ( 
GMSEYA+VTADYAVKVPEGLDPAQASSITCAGVTTYKAIKEAGAAPGQWI ++GAGGLGN 



TAVSKSn^QAIDSVRAGGTVmVGLPSEYMELSIVKIVLDGI+WGSLVGTHKDLEE^^ 





1 


Sbjct: 


76 




61 


Sbjct: 


136 






Sbjct: 


196 


(Juery: 


181 


Sbjct: 


256 


Query: 


241 



AFGAEGLV PWEKVPVDTAP+VPDEMERGLIQGRKVLDF 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2075 

A DNA sequence (GBSx2190) was identified in S.agalactiae <SEQ ID 6423> which 
acid sequence <SEQ ID 6424>. Analysis of this protein sequence reveals the following: 



Possible sit 

INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



42 

have a cleavable N- 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



17 



n signal seq. 



Transmembrane 
Transmembrane 
Transmembrane 
Tranamembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



187 - 203 
243 - 259 
404 - 420 
120 - 136 
308 - 324 
378 - 394 
152 - 168 



229 - 262! 



119 - 136; 
307 - 3241 



152 - 168; 
271 - 287; 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 4927 (Affirmative; 

■ CertaintY=0 . 0000 (Not Clear) . 

■ Certainty-0. 0000 (Not Clear) ■ 



A related GBS nucleic acid sequence <SEQ ID 937 1> which encodes amino acid sequence <SEQ ID 9372> 
was also identified. 

The protein has homology with Ihe following sequences in the GENPEPT database. 

>GP:AAC17857 GB:AF026147 Yojl [Bacillus subtilis] 
Identities = 183/432 (42%) , Positives = 266/432 (61%) , Gaps = 1/432 (0%) 

MKLFIPVLIYQFANFSATFIDSVMTGQYSQLHLAGVSTASNLWTPFFALLVGMISALVPV 60 

+ + IP+ I Q TP+D+VM+G^ S LAGV+ S+LWTP + L G++ A+ P+ 

LHILIPIFITQAGLSLITFLDTVMSGKVSPADLflGVAIGSSLWTPVYTGLAGILMAVTPI 74 



Q +Y+ +LS+ 4 



GASL ++LTYW 11+ ++ + Y 1+ T+ + +++GI,PIS +F 



E +IFA V L M+ F ++ la+HCJAAMNF+SL+Y PLS+S A 



OTYSRIGRLTAVGITSGTMiFLFLFRENVAaMYNSDPHFVAITAQFLTYSLFFQFADAYA 3 
+YS IG + A+G + T + LFRE +A MY SDP + +T FL Y+LFFQ +DA A 





1 


Sbjct: 


15 




61 


Sb j ct : 


75 






Sbjct: 


135 




181 


Sbjct: 


195 




240 


Sbjot: 


255 




300 


Sbjct: 


315 




360 


Sbjct: 


375 




420 



AP+QG LRGYKD 



420 LNQRLQKIKKLY 431 



I- LG F YWIGLI G+ 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2076 

A DNA sequence (GBSx2191) was identified in S.agalactiae <SEQ ID 6425> which encodes the amino 
acid sequence <SEQ ID 6426>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

INTEG!?AL Likelihood = -2.60 Transmembrane 23 - 39 ( 23 - 39) 

Final Results 

bacterial membrane — - Certainty=0 .2041 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm : — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2077 

A DNA sequence (GBSx2192) was identified in S.agalactiae <SEQ ID 6427> which encodes the amino 
acid sequence <SEQ ID 6428>, Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3829 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in Uie GENPEPT database. 

>GP:AAC06891 GB:AE000703 hypothetical protein [Aquifex aeolicus] 
Identities = 72/213 (33%) Positives = 115/213 (53%), Gaps = 11/213 (5%) 



CJuery; 

Sbjct: 

Query; 

Sbji 

Query; 

Sbj: 

Query: 

Sbji 



36 RPKILMHVCCAPCSTYTLEYLSQ— -WADVTIYFANSKIHPKDEYYRREYVTQKFVHDRN 92 

+ KIL+H+CCAP + y L+ L + +++ YF + NIHP +EY R T++ + 
3 KSKILVHICCAPDAIYFIiKKLREDYPESEIIGYFYDPNIHPYEEYRLRYLETERICKELG 62 

93 KNTGYSVQFLSAPYEPNEFFKIVHGLEEEPEGGDRCKVCYDFRLDJCTAEKRVELGFDYFG 152 

N + Y+ + + V G E+EPE G RC++C+D+RL+K+AE A ELG D 

63 IN LIBGEYDLENWLERVKGYEDEPERGKRCQICFDYELEKSftEVAKELGCDALT 116 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 6429> which encodes the amino acid 
sequence <SEQ ID 6430>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
»> Seems to have no N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3498 (Affirmative) < suco 

bacterial membrane Certainty=Q .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 254-256 

The protein has homology with the following sequences in the datahases: 

>GP:AACQ6891 GB:AE0Q0703 hypothetical protein [Aquifex aeclicus] 
Identities = 65/182 (35%) , Positives = 106/182 (57%) , Gaps = 9/182 (4%) 

Query: 39 RPSILMHVCCAPCSTYTLEyiiTQF ADITVYFANSNIHPKDEYHRRAYVTQQFVSEFN 95 

+ IIi+H+CCRP + Y L+ L + ++I YF + HIHP +EY R T++ E . ' 

Sbjct: 3 KSKILVHICCRPDAIYFLKKIiIffiDYPESEIIGyPTOPNIHPYEEYRLRYLETKRICKBLG 62 

Query: 96 AKTGNTVQFLEADYVENEYVRQVRGLEEEPEGGDRCRVCFDYRIDKraQKaVELGFDYFA 155 

+ +E +Y ++ +V+G E+EPE G RC++CFDYRL+K+A+ A ELG D 
Sbjct: 63 INLIEGEYDLENWLERVKGYEDEPERGKRCQICPDYRLEKSAEVAKELGCDRLT 116 

Query: 156 SALTISPHKNSQTIKDVGIDVQKVYTTKYLPSDFKKN0GYRRSVEMCEEYDIYRQCYCGC 215 

+ L +SP K+ + G + K ++L D++K G + , ++ +E +IY+Q YCGC 
Sbjct: 117 TTLLNBPKKSIPQLKKAGEBATKRTGIEFLAPDYRKGGGTQEMFKLSKEREIYQQDYCGC 176 

Query: 216 VY 217 
+Y 

Sbjct: 177 lY 178 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 184/255 (72%), Positives = 219/255 (85%) 

Query: 1 MIDVENILEKMKPNQKINYDWVMQQMVKQWQASDIRPKILMHVCCAPCSTYTLEYLSQWA 60 

MID++ IL M PNQKINYD VMQQM K W+ +RP ILMHVCCAPCSTYTLEYL+Q+A 
Sbjct: 4 MIDLQEILAMMNPNQKINYDRVMQQMAKVWEKESVRPSILMEIVCCAPCSTYTLEYLTQFA 63 

Query: 61 DVTIYFANSNIHPKDEYYRREYVTQKFVHDFNKNTGYSVQFLSAPYEPNEFFKIVHGLEE 120 

D+T+YFANSNIHPKDEY+RR YVTQ+FV +FN TO +VQFL A Y PNE+ + V GLEE 
Sbjct: 64 DITVYFANSNIHPKDEYHRRAYOTQQFVSEENAKTOWIVQFI^IADYVPKEYVRQVRGLEE 123 

Query: 121 EPEaSDRCKVCYDFRrjDKTAEKAVELGFDYFGSALTISPHKNSQTINTIGIDVQKIYDTQ 180 

EPEGGDRC+VC+D-hRLDKIA+KRVELGFDYF SRiyriSPHKNaOTIH +aiDVQK+Y T+ 
Sbjct: 124 EPEGGDRCRVCFDYRLDKIAQKAVELGFDYFASALTISPHKNSQTIHDVGIDVQKVYTTK 183 

Query: 181 YI■Pa3LKKNESYQRSVEMCKDYDIYRQCYCGCIFGAKDQGI^^LLQIKKnaKAPVSDKD6K 240 

YLPSD KKN GY+RSVEMC++YDIYRQCYCGC++ AK QGI+L+Q+KKDAKAF++DKD 
Sbjct: 184 YLPSDFKKHNGYRRSVEMCEEYDIYRQCYCGCVYAaKMQGIDLVQVKKDAKAFMM 243 

Query: 241 EEFPNIRFTFNGKSM 255 

+F +IRF++ G M 
Sbjct: 244 HDFTHIRPSYRGDEM 258 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens foi: 
vaccines or diagnostics. 

Example 2078 

A DNA sequence (GBSx2193) was identified m S.agalactiae <SEQ ID 6431> which encodes the amino 
acid sequence <SEQ ID 6432>. Analysis of this protein sequence reveals the following: 
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N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4216 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14809 GB:Z99118 excinuclease ABC (subunit C) [Bacillus subtilis] 
Identities = 189/333 (56%) , Positives = 244/333 (72%) 

MNELIKHKLEIjLPDSPGCYLHKDKNGTI lYVQiaKNLKNRVKSYFHGSHNTKTELLVSEI 6 0 
MN+ +K KL LIiPD PGCVL 103+ T+IYVGKRK LKHRV+SYF GSH+ Kr+ LV+EI 
MNKQIiKEaaLMiLPDQPGCYLMKDRQQTVIYVGKJiKVLKiniVM 60 

EDFEYIVTTSOTEaiiLIiEINLIQENMPKYHIRIOTDKSYPYIKIl^ 120 
EDFEYIVT+SN EAL+LE+NLI+++ PKYH+ DKDDK+YP+IK+T+ER+PRL++TR VKK 



G YPGPYP+ Aft. E K+DLDRL+P +KC+ ++VC YYHLGQC A V 



L E + +FL G N++ h EKM AA +EFERA E RD I I 



D+ DRDVF Y DEGWMCVQVFF+R GKLI+RDV+MFP Y E +B+FLT+IGQFY 





Query: 


1 


15 


Sbjct: 


1 






61 




Sbjct: 


61 


20 








Query: 


121 




Sbjct: 


121 


25 




181 




• Sbjct: 


181 






241 


30 








Sbjct: 


241 






301 


35 


Sbjct: 


301 



FLPKE+ +P 



There is also homology to SEQ ID 2568. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2079 

A DNA sequence (GBSx2194) was identified in S.agalactiae <SEQ ID 6433> which encodes the amino 
acid sequence <SEQ ID 6434>. This protein is predicted to be maltose operon transcriptional repressor 
(rbsR). Analysis of this protein sequence reveals the following: 

I N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0. 3761 (Affirmative) < suco 

bacterial membrane Certaintyi=o . OOOO (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SBQ ID 9393> which encodes amino acid sequence <SEQ ID 9394> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD02112 GB:AF039082 putative maltose operon transcriptional 
repressor [Lactococcue lactis] 
Identities = 64/166 (38%) , Positives - 105/166 (62%) , Gaps = 13/166 (7%) 
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Query: 1 MGKSAIDYIYKKCSHKSIQFVTDDLNSEVSEERYLGYFKGARKLGLNQKPALLFDRGMPQV 60 

+G+ A+ L + H++I FVTD +EV EERY G+ A +LGL+ LLF N + 
Sbjct: 169 LGREAVRLLAQLNHQNISFVTDTKETEVFEERYQGFKDEAERLGLSHD--LLFMDSNFSL -226 ■. ■ 

Query: 51 LEEFiNRVKEEETTALIVIGDTVSVRVMQFLSFYKLKVPDDISIMTENNSLFSHLIHPYL 120 

E TA1jH-V+ D +S++V++ L L VP+D+S++T+NHS+F +IHPYL 

Sbjct: 227 RME TMiVVtroDVLSLKVVERmSQGIJ!TO>EDVSLITY]SlNSIFC3aU*IIHPYL 276 

Query: 121 STFDINVNNLGRTSVEELID1IKSP0KVFSETI1VPFTLEERESVR 166 

+TFDI++ LG +++++++D+ + + + +TII PF L RES + 
Sbjct: 277 TTFDIHIEQLGASAIKKILDLRDNKENLPEKTII-PFELIVRESTK 321 

There is also homology to SEQ ID 5082. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2080 

A DNA sequence (GBSx2195) was identified in S.agalactiae <SEQ ID 6435> which encodes the amino 
acid sequence <SEQ ID 6436>. This protein is predicted to be 4-alpha-glucanotransferase (malQ). Analysis 
of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2003 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protem has homology with the following sequences in the GENPEPT database. 

>GP:AAM6923 GB:J01796 amylomaltase [Streptococcus jmeumoniae] 
Identities = 250/500 (50%) , Positives = 329/500 (65%) , Gaps - 4/500 (0%) 

MKKRASGVLMHITSLPGDLGIGTFGREAYAFVDFLXffiTDQKFWQILPLTTTSPGDSPYQS 60 
MKKR SGVLMHI+SLPG GIG+FG+ AY PVDFLV T Q++WQILPL TS+GDSPYQS 
MKKRQSGVLMHISSIiPGAYGISSFGQSAYDFVDFLVRTKQRYWQIIiPLQATSYGDSPYQS 6 0 





1 


Sbjct: 


1 




51 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




180 


Sbjct: 


180 




240 


Sbjct: 


240 




300 


Sbjct: 






360 


Sbjct: 


360 



h FG D VDYA ++ RRP+LEK&VK F 



f W+ EaE+MAIKE+F N A EW D R+ AL YR++ 



L++ + YH VTQYFF++QW +LK YAMD I+I+GDMPIYVh- DS ++« P LFK D + 



lAG P D+FS GQLWGNPIY+W+ + + Wl-JI R++ K+YD +RIDHF+G 



H++ Y GTHDN 
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PITETVLRTLYATCSQTTITCMQDLLDKPADSRMNMPl^ITVGGNWQWRMRKEDL 478 
R E + +LRT++ + +VS I MCDLL+ _+RMN P+T+GGNW WRM ++ L 
Sbjct: 419 TNRKEYETVVHMLRTVFSSVSFMAIATMQDLLELDEAARIffilFPSTLGG^^ 47B 

Query: 479 TEttJRKAFLKEITTISNRGNK 498 ' 

T + L ++TTiy R N+ 
Sbjct: 479 TPAVEEGIiLDLTTIXRRINE 498 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6437> which encodes the amino acid 
sequence <SEQ ID 643 8>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.85 Transmembrane 435 - 451 ( 435 - 451) 

Final Results 

bacterial membrane Certainty=0. 1341 (Affirmative) < suco 

bacterial outside Certaintyt=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 313/495 (63%) , Positives = 387/495 (77%) 

Query: 1 IVIKKRASGVIiMHITSLPGDLGISTFGREaYAFVDFLYETDQKFWQIIjPLTTTSEGDSPYQS 60 

M KRASG+LMHI+SLPG GIGTF6+ A+ FVDFL ET Q +WQILPLTTTSFGDSPYQS 
Sbjct: 1 MNKEASGILMHISSLPGKFGIGTFGKSAFEFVDFLaETKQTXWQILPLTTTSFGDSPYQS 60 

Query: 61 FSAVAGNTHLIDPnLLTLEGFISKDDYQNISFGQDPEWDYAGLFEKRRPVLEKAVKNFL 120 

FSA+AGNTH IDF+LL + + D +I+FG +PE VDYA LF+ RRP+LEKAV+ F+ 
Sbjct: 61 FSAIAfSNTHFIDFEajLVDDELLEAADLCDITFGTNPEAVDYAQLFQVRRPLLEKAVRAFV 120 

Query: 121 QEERATRra.SDFLQEEKWVTDFAEFMAIKEHFGNKALQEOT)DKAIIRREEE!aLaGYRQKL 180 

E+ L F W+TDFAEPMA+KE+F NKALQ+WDD+ +I+R+B++L YE+ L 

Sbjct: 121 AEQENVCKLEAFETASSVn^TDFAEFMALKEYFHKKMiQDWDDBTVIKRQBIBMmYREIi 180 

Query: 181 SEVIKYHEVTQYFFYKQWFELKEYANDKGIQIIGDMPIYVSaDSVEVWTMPELFKLDRDK 240 

++ 1 YH+V QYFFY+QW LK YAN KGI + IIGDMPIYVSADSVEVWTMPELFK+D DK 
Sbjct: 181 AKKITYHKVCQYFFYQQWSALKTYANHKGIEIIGDMPIYVSADSVEVWTMPELFKVDSDK 240 

Query: 241 QPLAIAGVPADDFSDDGQLWGNPIYNWDYHKESDFDMWIYRIQSGVKMYDYLRIDHFKGF 300 

+PL lAGVPAD FS+DGQLWGNP YNW H++S+F WWIYRIQ K+YD LRIDHFKGF 
Sbjct: 241 KPLFIAGVPADGPSEDGQLWGNPTYNWSAHEKSNPAMWIYRIQESFKLYDQLRIDHFKGF 300 

Query: 301 SDYWEIRGDYQTANDGSWQPAPGPELFATIKEKLGDLPIIAENLGYIDERAERLLaGTGF 360 

SD+WEI +TA +G W APG LF+ ++E LG+LPIIAENLGYIDE+AE+LLA TGF 
Sbjct: 301 SDFWEIPAGDraMOTSHWSJSAPGIALFSAVREALGELPIIAENIXSYIDEKAEQLLaSTO^ 360 

Query: 361 PGMKIMEFGFYDTTCNSlDIPHNYTEIWmYAGTHDNEVINGWFENLTVEQK^^ 420 

PGMKI+EFG +D T SID+PH Y N +AY 6THDNEV+NGW++NL+ EQ + NY+ 
Sbjct: 361 PGMKILEFGLFDITSQSIDLPHYYDRNCVAYTGTHDNEWNGWYDNLSEEQVHFVNIIYLH 420 

Query: 421 RLPNEPITETVLRTLYATVSQTTITCMQDLLDKPADSRMmiPKTVGGNWQWRMRKEDLTE 480 

+ +E IT+ +LRT++A+V T I C+QDLLDK SEMNMPNT+GGNWQWRM +L + 
Sbjct: 421 KHADESITKAMLRTIFASVCDTAILCIQDLI£)KIX3KSRMNMPNTIGGNHQVra4LIX3EI^^ 480 

Query: 481 NRKAFLKEITTIYNR 495 

+ K +L +T +Y R 
Sbjct: 481 DHKDYLIYLTDLYGR 495 

Based on this analysis, it was predicted that tiiis protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 
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Example 2081 

A DNA sequence (GBSx2196) was identified in S.agalactiae <SEQ ID 6439> which encodes the amino 
acid sequence <SEQ ID 6440>. This protein is predicted to be glycogen phosphoiylase (malP). Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- CertaintY=0 . 2678 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00218 GB:AF008220 glycogen phosphorylase [Bacillus subtilis] 
Identities = 297/776 (38%) , Positives = 452/776 (57%) , Gaps = 41/776 (5%) 

Query: 13 GKVLSELraJEEIYVELLNFVKEERAA KSKNSSQRKVYYISaEFLIGKLIiSNNIli 65 

GK + + Y L N V+E +A 
Sbjct: 21 GKSFKDSAKLDQYKTLGNMVREYISADW] 



Query: 66 INLGIYKDVKKELELVGKSIAEIEDVEPEPSLGNGGLGRLASCFIDSISSLGINGEGVGL 125 

+NLG+ V+ L+ 4G ++ EI +E + LGKG3I,GRriA+CF+DS++SL + G G+G+ 
Sbjct: 81 MMjGTODVVEAGLKEIGINLEEILQIEKDAGLGNGGLGRLAACFLDSLASLNLPGHGMGI 140 

Query: 126 JSTYHCGLFKQVFRNNQQEftEftNYWIEN-NSWLVPT-DISYDVPF- RDFTLKSRL 175 

Y GLF+Q + Q W++N N W V D + DVPP + L R 

Sbjct: 141 RYKHGLFEQKrVDGHQVELPEQWLKNGNVWEVRlMQaVDVPFWGKVHmEKSGRI^ 200 

Query: 176 DR IDVLGYKaCDTKNYIiKLFDIDGUDYNLIEKBITFDKTEIKKNLTIiFLYP 225 

+4- I ++Gy+ T N L L++ + Y G + ++ PLYP 

Sbjct: 201 EQa.TIVTAVPYDIPIIGYETGTVOTLRLWNftE--PYAHYHGGNILSYKRETEaVSEFLYP 258 

Query: 226 DDSDKNGELLRIYQQYEMVSNAflQLLlDEAIERGSNLHDLAEYAYVQINDTHPSMVIPEL 285 

DD+ G++LR+ QQYP+V + + +++ + +L L + + IKIDTHP++ +EEL 
Sbjct: 259 DDTHDEGKILRLKQQYPLVCaSLKSIVN^mlKTHKSLSGLHKKySIHINDTHPAIlAVEEL 318 

Query: 286 IRLLTEKHGFEFDEAVSVViamVGYTNHTIIiaEaLEKWPLEYiaiEWPHLWIIK^^ 345 

+R+L ++ ++EA + + + YTNHT L+EALEKWP+ ++P + II+++++ 

Sbjct: 319 MRILLDEEiraSWEEAVffllTVHTISYTmrTLSEALEKWPIHLFKPIiPRMYMIIEEIN^ 378 

Query: 346 IRE EQOTPEVQIIDEftGRVHMaHMDIHFSTSVNQVaaiHTEIIiKNSELKVFY 397 

+ E I G V MAH+ I S SVM3VA +H++1LK K++ F+ 
Sbjct: 379 FCRAVWEKYPGDWKRIENMAITAHGVVKMAHLAIVGSYSVl^ 438 

Query: 398 DIYPDKFNNKTNGITFRRWLEFflNQDLaDYLKELIGDSYLTDATQIiEKLLTYMSIIEVHD 457 

++P++FNNKTNGI RRWL AN L+ + E IGD ++ L +L YA + 

Sbjct: 439 LLFPNRFISnmilGIAHRRWIiLKfOIPGLSAIITEAIGDEWVKQPESLIRLEPYATDPAFIE 498 

Query: 458 KliAAIKFKNKLALKRYLKENKGIELDEYSIIDTQIKKFEEYKRQQMNALYVIHKYLEIKR 517 

+ K K K L + G+ ++ SI D Q+KR E YKRQ +N L-H-1-+ Y +K 
Sbjct: 499 QFQNNKSKKKQELADLIFCTAGVVTOPESIFDVQVKRLHAYKRQLLETOjHIMYLYNRLKE 558 

Query: 518 GH-FPSRKLTVIFGGKAAPAYTIAQDIIHIiILCI^ELINISnDPEWKYtNVHLVENYNVTV 576 

F T IFG KA+P+Y A+ II LI ++E +N DP V + + V +ENY V++ 

Sbjct: 559 DSGFSIYPQTFIFGAKASPSYYYAKKIIKLIHSVAEKVlSnfHPAVKQLIKVVFLENYRVSM 618 

Query: 577 AEKLIPATDISEQISLASKEASGT6mKFMmGALTLGTMDG2aiVEIAELAGKHNIYTFG 636 

AE++ PA+D+SEQIS ASKEASGT6NMKFM+NGALT+GT DGAN+EI E G + lYTFG 
Sbjct: 619 AERIFPASDVSEQISiaSKEASGTGtmKFIiMJGALTIGTHDGaNIEILERVGPDCIYTFG 678 

Query: 637 KDSDTIIHLYETSGYRSKDYYDKDKVIRERVDFIISDDIVSLGNAERLKRLHDELV-GKD 695 

+D +++ E GYRSh-hYY D+ IR+ D +1+ G A+ + + D L+ D 

Sbjct: 679 I.KADEVLSYQEMGGYRSREYYQHDRRIRQVADQLINGFFE--GEADEFESIFDSI1LPHND 736 
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Query: 696 WFMTLIDLKEYIAVKEQVLADYEDYESWlMKKi/IHNIAKAGFFSSDRTIEQYNQDIW 751 

+ L D Y +E++ ADY + W++ I MIA +G+FSSDRTI +Y 4DIW 
Sbjct: 737 EYFVLKDFSSYflDRQERIQADYRERRKWSEHSIVNIIAHSGYFSSDRTIREYAKDIW 792 

A related DNA sequence was identified in S. pyogenes <SEQ ID 644 1> which encodes t 
sequence <SEQ ED 6442>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.71 Transmembrane 538 - 554 { 538 - 554) 

Pinal Results 

bacterial membrane Certainty=0 .2084 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty-0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 629/754 (83%) , Positives = 696/754 (91%) , Gaps = 2/754 (0%) 

MTiOTFTTYVGQQ-GKVLSELTNEErXVELIMFVKEKUiAKSKNSSQEyW^ 59 
MTR FT YV + GK L++ +NEEIY+ LIJSIFVKEEA+ K+KNS4-+RKVYYISJiEFIiIGK 
MTR-F 



LLSNNLINLGIYKD+K+EL GKSIAE+EDVE EPSLGNGGLGRLASCFIDSI+SLGIN 



Query: 


1 


Sbjct: 


1 


Query: 


60 


Sbjct: 


60 


Query: 


120 


Sbjct: 


120 


Query: 


180 


Sbjct: 


180 


Query: 


240 


Sbjct: 


240 




300 


Sbjct: 






360 


Sbjct: 


360 


Query: 


420 


Sbjct: 


420 




480 


Sbjct: 


480 




540 


Sbjct: 


540 


Query: 


600 


Sbjct: 


600 


Query: 


660 


Sbjct: 


660 



GEGVGRIYHCGLFKQVF++N+QEaE N+WIE+-HSMLVPTDISYDVPF++FTLKSRLDRID 



QYFMVSNAAQL+IDEAIERGSNLHDLA+YAYVQINDTHPSMVIPELIRLLTEKHGF+FDE 



E+GRVHMAHMDIHF+TSVNGVAALHTEILKNSELK FYD+YP+KFNNKTNGITFRRWLEF 



ANQDLRDY+KELIGD YLTDAT+I.EKL+ +AD VH KLA IKF NKIALKRYIiK+NK 



AQDIIHLILCLSELINNDPEV+ YLNVHLVENYNVTVAE LIPATDISEQISLASKEASG 



4- GN KRL RL+ EL+ KDWFMTLIDL+KYI VKE++LADYED 
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-2349- 



Query: 720 YESWNKKVIHNIAKAGFFSSDRTIEQYNQDIWHS 753 

+ W KV+HNIAKAGFFSSDRTIEQYN+DIWHS 
Sbjct: 720 QDLWMTKWHNIAKAGFFSSDRTIEQyUEDIWHS 753 



Based on this analysis, it was predicted that this protein and its epitopes, could be usefUl antigens for 
vaccines or diagnostics. 



Example 2082 

A DNA sequence (GBSx2197) was identified in S.agalactiae <SEQ ID 6443> which 
10 acid sequence <SEQ ID 6444>. This protein is predicted to be glycerol-3-phosphatase 
Analysis of this protein sequence reveals the following: 



(glpT). 



INTEGRTUIj 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



D N- terminal signal sequence 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



64 



- 127) 
■ 214) 

- 385) 

- 127) 

- 424) 

- 182) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• CertaintyirO. 5352 (Affirmative) < e 
■ Certainty=0. 0000 (Not Clear) < sue 

• Certainty=0. 0000 (Not Clear) < sue 



The protein has homology with the following sequences in the GENPEPT database. 



P;AAC44575 GB:U28354 IS629 ORPB fused with sequences similar t 
coli GlpT and UhpT proteins, Swiss-Prot Accession Nuti 
P08194 and P09836; Method: conceptual translation 





109 


Sbjct: 


12 


Query: 


169 


Sbjct: 


71 




229 


Sbjct: 


131 




289 


Sb j Ct : 


188 




349 


Sbjct: 


248 




409 


Sbjct: 


308 



YL1-NGW+QGMGYPPGA4-TLV+WY+++ERI +AT+MNLSHN GGA 



h +II+K+I+ N KL++ +Y FVYILRYGIVSW PKFL 



L AYW++P+G +Y-t- L4-+ IL+ LG ++YGPVM +GLy+MELVPK AAGAASGL+GTFSY 



- G+ +ATL +G+++D-1- GWG 
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-2350- 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6445> which encodes the amino acid 
sequence <SEQ ID 6446>. Analysis of this protein sequence reveals the following: 

Possible site: 36 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood =- 




37 


Transmembrane 


185 


201 


175 


208 


INTEGRAL 


Likelihood = 


-9 


13 


Transmembrane 


114 


130 


90 


134 


INTEGRAL 


Likelihood = 




75 


Transmembrane 


322 


338 


320 


345 




Likelihood = 


-6 


79 


Transmembrane 


421 


437 


419 


439 


INTEGRAL 


Likelihood = 


-6 


37 


Transmembrane 


91 


107 


90 


113 


INTEGRAL 


Likelihood = 


-5 


36 


Transmenibrane 


163 


179 


161 


181 


INTEGRAL 


Likelihood = 


-5 


20 


Transmembrane 


350 


366 


347 


371 


INTEGRAL 


Likelihood = 


-4 


41 


Transmembrane 


23 


39 


22 


41) 


INTEGRAL 


Likelihood = 


-3 


77 


Transmembrane 


257 


273 


249 


273) 


INTEGRAL 


Likelihood - 




33 


Transmembrane 


61 


77 


61 


77 


INTEGRAL 


Likelihood = 


-1 


28 


Transmembrane 


383 


399 


383 


399) 


INTEGRAL 


Likelihood = 


-0 


90 


Transmembrane 


299 


315 


299 


315) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5946 (Affirmative) . 

■ Certainty=0 . 0000 (Not Clear) < s 

■ Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:A2^96050 GB:AE004355 glycerol-3-phosphate transporter [Vibrio cholerae] 
Identities = 128/438 (29%) , Positives = 215/438 (48%) , Gaps = 17/438 (3%) 

CJuery: 1 LFMEEDraKRBP-EKFTQFLRRQKVVFFVAFF-GYVCAYLVRNHFKLMSNTI^l^^ 58 

LF + +R P +K' R + F+ F GY YL R NF L + +++ G+ + 

Sbjct: 21 LFKPAAHTQRLPSDKUDSVySRLRWQLFIGIFVGYAGYYLGRKNFSL-AMPYLIEQGFSR 79 

Query: 59 AQIAILLSCLTVSYGLAKFYMGALGDRVSLRKLFSISLGASALICILIGFF NSSIWV 115 

4 + L ++++YGL+KF MG + DR + R S L SAL+ GP S+ 
Sbjct: 80 GDLGVALGAVSIAYGLSKFLMGNVSDRSNPRYFLSAGLLLSALVMFCFGFMPWATGSITA 139 



Query: 116 LGILLVLCGWQGaUiAPASQaMIflNYFPNKrRGGAIAGWNISQNMGSALLPLTIALLTSM 175 

+ ILL L G QG PA + +++ K RG ++ WN++ N+G L I + + 
Sbjct: 140 MFILLFLNGWFQGMGWPACGRTMVHWWSRKERGEIVSVMNVAHNVGGGL IGPIFLL 195 

Query: 176 GLWPANGNILLAFLIPGVLVELFALCCWKLGGDNPESEGLDSLRTMYGDSGESAVASEE 235 

GL + N + AF +P L A+ W + D P+S GL + D + S E 

Sbjct: . 196 GLira-ENDDWRTAFYyPAFFAVLVAVFTWLVMRDTPQSaSLPPIEEYKNDYPDDYDKSHE 254 

Query: 236 EKHNLSYWQLIWKYVFCNPSLLLVAAVNVALYFVRFGIEDWMPIYISQVANMSEAHIHFA 295 

+ ++ ++ +KYVF N L +A N +Y +R+G+ DW P+YL + + + +A 
Sbjct: 255 NE--MTAKEIFFKyVFNNKLLWSIAIM]AFVYLIRYGVLDWAPVYLKEAKHFTTO 312 

Query: 296 ISMLEWVAIPGSLVFAWLAVR-YPNKMAKVGAIGLFVLAAIVFVYERLTATGAPIIYFLLL 354 

+ EW IPG+L+ W++ + ++ AG + + ++ VVY GP + 

Sbjct: 313 YFLYEWASIPGTLLCGWISDKVFKGRRAPAGILEMVLVTLAVLVY-WFNPAGNPAVDMAA 371 

Query: 355 VIAGirjGSBIYGPQLIVNILTINFVPLNVAGTAIGFVGVTAYLIGNMQAHWEMPILADGF 414 

++A +6 LIYGP +++ + + ' P AGTA GG+VLG + AN++ DF 
Sbjct: 372 LVA--IGFLIYGPVra,IGLYALELAPKKAAGTAAGLTGLFGYLGGAVAANAILaYTVDHF 429 

Query: 415 GWFWSYIWAALSAFSAV 432 

GW ++V+ A S + 
Sbjct: 430 GWDGGFMVLVASCVLSVL 447 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/439 (26%) , Positives = 203/439 (45%) , Gaps = 27/439 (6%) 

60 

Query: 23 KYPRYRVQVLISIFVGYMGYYFVRNTTSILSGILNMS ATEIGIITCASYIAYGLSK 78 

++ R + V F GY+ Y VRN ++S + + +1 1+ h-hYGL+K 

Sbjct: 17 QFIJaiQKVVFFVAFFGYVCAYLVRNNFKLMSNTIMVQNGMDKAQIAILLSCLTVSYGLAK 76 
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79 


Sbjct; 


77 


Query: 


139 


Sbjct: 


134 


Query: 


195 


Sbjct: 


191 


Query: 


255 


Sbjct: 


248 


Query: 


315 








373 


Sbjct: 


361 




433 


Sb j ct : 


421 



riWNLSHNFGGAIAPI LTGVGLAIiAGNDSLNQARAAYW 194 

WN+S N G A+ P4- LT +GL + N ++ A+ 
iGWNISQNMGSALLPLTIALLTSMGLWPANGNI— -LIAFL 190 



^ L D PES GL 



KY+ N L+ + ++ +Y +R+GI W P +L+ I S+ E 

KyVFCNPSLLLVAAVNVALYFVRFGIEDWMPIYLSQVAHMSEaHIHFA ISMLEWV 302 



I G L +L+ 4 



G+ lYGP ++V + + VP AGAGG+Y+G 



+S F+A+ +L++ 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2083 

A DNA sequence (GBSx2198) was identified in S.agalactiae <SEQ ID 6447> which encodes the amino 
acid sequence <SEQ ID 6448>. Analysis of this protem sequence reveals the following: 

;entiinal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3202 (Affirmative) < suco 

bacterial meuibrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6449> which encodes the amino acid 
sequence <SEQ ID 6450>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4473 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/100 (54%), Positives = 67/100 (67%) 

Query: 1 MTYELCLEYGTYPLRPVDAWRDEIOTAPAFITEDKKIJjELLEEVNTLFHELFLTIECSFH 60 

MTYELCLEYGTYPL VDA+ E P FX ED+ L LE +N LFH+LF+TIE FH 
Sbjct: 1 MTYEKLEYGTYPLSRVDAYWGEDQNPPTFIQEDRLLCHKLETMNHLFHDLFVTIESQFH 60 
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Query: 61 YlGHDFPEKRAKITQIYHVIIEHLSIHYPEinDIKIESLIM 100 

Y+G + PEKRA.+ I +Y + L Y +Y IKIE+ L+ 
Sbjct: 61 YVGFNMPEKRAQIEILYQEVATILKSKYKDYPIKIETFLL 100 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2084 

A DNA sequence (GBSx2199) was identified in S.agalactiae <SEQ ID 6451> which encodes the amino 
acid sequence <SEQ ED 6452>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2369 (Affirmative) < succ 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MSEKIRVLLYYKYVSIENAEEYAAKHLEFCKSIGLKGRILIADEGINGTVSGDYETTQKY 60 

M++ RVLLYY+YV IE+ E +A KHL CK +GLKGRIL4-ADEGINGTVSG E T Y 
Sbjct: 1 MTQDYRVLLYYQYVPIEDGETFAQKHLADCKEIGLKGRILVADEGINGTVSGTIEQTNAY 60 

Query: 61 MDWVHSDERFADLWFKIDEENQC3RFRKMFVRYKKEIVHI.GLEDNNFDSDINELETTGEYL 120 

M+ + +D RF+ FKIDE Q AF+KM VRY+ E4-V+L LED D+NPLE TG YL 

Sbjct: 61 MELMKNDPRFSSTIFKIDEMQNAFKKMHVRYRPELVMLSLED DVNPLELTGAYL 115 

Query: 121 NPKQFraMLDEDiyVTiDTRMrrYEYDLGHFRGAIRPDIimFRELPQWVRDNro 180 

+PK+F+EA+LDE+TW+D R1JDYE+DLGHFRGAIRP+IR+FRELPQW+RDNK++FMEKRV 
Sbjct:, 116 DPKEFRE2MU3ENTWIDAENDYEPDLGHFRGAIRPEIRSFRELPQWIRDNKEQFMEKRV 175 

Query: 181 WYCTGGVECEKFSGWMVREGFKDVGQLHGGIATYGKDPEVQGELWDGAMYVFDDRISVP 240 

+ YCTSG+RCEKPSGW+VHEGFKDVGQL GGIATYGKDEEVQG+LWDG myvfd ri+vp 
Sbjct: 176 LTYCTGGIRCEKFSGWLVREGFKDVGQLLGGIATYGKDPEVQaDLWDGQMYVFDSRIAVP 235 



Query: 301 

++ + LS QE ERL + + L 
Sbjct: 296 IKAHQLSNQEVQERIAIiLEKDL 317 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6453> which encodes the amino acid 
sequence <SEQ ID 6454>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2443 (Affirmative) < suco 

bacterial menibrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 321/324 (99%), Positives = 323/324 (99%) 
Query: 1 MSEKIRVLLYYKXVSIENREEYAAKHLEFCKSIGriKGRILIADEGlNGTVSGDYETTQKY 60 
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MSESKIRVLLTyKWSIEm+EYjyUOJLEFCKSIGLKGRILIimEGIMGTVSGDXBTTQKy 
Sbjot : 1 MSEKIRVLLYYKWSIENAQEYAAKHLEFCKSIGLKGRILIADEGINGTVSGDYETTQKT 60 

Query: 61 ^C>WVHSDERFi!UDLWPKIDEENQQAFRKMFTOYKKEIVHI^EDMFDSDINPI:^^ 120 

MDWVHSDERFMLWFKIDEENQQAFRKMFVRYIOCEIVHLGLEDimFDSDINPLET^ 
Sbjct: 61 MDWVHSDERPADIiWFKIDEENQQAPRKMFVRYKKEIVHLGLEDNNFDSDINPLETTGEYL 120 

Query: 121 NPKQFKEMiLDEDTWLDTRra)YEYni^HFRGAIRPDIRNFRELPQWVRnNKDKFMEKRV 180 

NPKQPKEALLDEDTVVLDTKiroYEyDLGHPRGAIRPDIRNFRELPQWVRDNKDKFMEKRV 
Sbjct: 121 NPKQFKEALLDEDTVVLDTRlSroYEYDLGHFRG&IRPDIRNFREIiPQWVRDNKDKFMEKRV 180 

Query: 181 WYCTGGVRCEKFSGWMVREGFKDVGQLHGGIATYGKDPEVQGELWDGIWIYVFDDRISVP 240 

VVYCTGGVRCEKFSGWMVREGFKDVGQLHGGIATYGEa3PEVQGELWDG2WIYVFDDRISVP 
Sbjct: 181 WYCTGGVECEKFSGWMVREGFKDVGQLHCSGIATYGKDPEVQGELVIDGaMYVFDDRISVP 240 

Query: 241 INHWPTVISKDYFDGTPCERYWCMPFCNKQIPASEENEliKyVRGCSPECaWfflERNRY 300 

INHVNPTVISKDYFDGTPCERYVNCMPFCNKQIFASEBNE KYVRGCSPECRflHERNRY 
Sbjot: 241 INHVNPTVISKDYFKSTPCERYWCIANPFCNKQIFASEENETKYVEGCSPECIJM^ 300 

Query: 301 VQBNGLSRQEWKERUEMGESLPQ 324 

VQENGLSRQEWAERLEA.lGESriP'l- 
Sbjct: 301 VQENGLSRQEWAERLEAIGESLPE 324 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefial antigens for 
vaccines or diagnostics. 



Example 2085 

A DNA sequence (GBSx2200) was identified in S.agalactiae <SEQ ID 6455> which encodes the amino 

acid sequence <SEQ ID 6456>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
30 »> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=a .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 bacterial cytoplasm (2ertainty=0.0000(Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT 

?GP:AAC83954 GB:L47648 putative [Bacillus subtilis] 
Identities = 54/192 (28%) , Positives = 89/192 (46%) , Gaps = 14/192 (7%) 

Query: 5 QTIIIGAGRaGIGFGSAMQRLGLTNFLIIEKBHIGESFLRWPRTTQPITPSFTTMGPGFP 64 

+ IIIG G G+ ++++G+ + L+IEKG+1- S +P P + S 
Sbjct: 5 KAIIIGGGPCtaSflAIHLRQIGI-DBIjVIEKGNVVNSIYNyPTHQTFFSSSEKIiE 58 

Query: 65 DIJaVIPDTSPAFSFEKEHLSGVEYARYLQLVAaHYNLPIQNETSVLSIDK-RDSLPVIK 123 

ID AF E ++ Y + V N+ + V + K +++ FVI+ 

Sbjct: 59 IGDV--AFITEnRKPVRIQALSYYREWJa?KNIRVlIAFEMVRKVTKTQNNTFVIE 111 



Query: 124 TSKGDFSADYLIMATGEFQNPNTIDIKSADLSMHYGQVDNFHIKSDNPFIIIGGNESACD 183 
50 TSK ++ Y I+ATG + +PN + + G DL + H D ++IGG S+ D 

Query: 184 ALTHLVYLGHQV 195 
A LV G +V 
55 Sbjct: 172 AALELVKSGARV 183 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 
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A related GBS gene <SEQ ID 8973> and protein <SEQ ID 8974> were also identified. Analysis of this 
•protein sequence reveals the following: 

Iiipop Possible site: -1 Crend: 2 
McG: Discrim Score: 5.05 
5 evH: Signal Score (-7.5): -3.14 

Possible site: 57 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count; 0 value: 0.26 threshold: 0.0 
PERIPHERAL Likelihood » 0.26 6 
10 modified ALOM score: -0.55 

*** Reasoning Step: 3 

Final Results 

15 bacterial membrane CertaintY=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

20 

33.2/56.1% over 281aa 

Bacillus subtilis 

egad] 109228 1 hypothetical protein Insert characterized 

GP|2635109|emb|caB14605.l| |Z99117 alternate gene name: yrdP Insert characterized 
25 GP|l934657|gb|AAB80908.l| |U93876 hypothetical protein YrdP Insert characterized 

PIR|E69725|E69725 potassium uptake trkA - Insert characterized 

ORF01799{310 - 1128 of 1725) 

EGAD 1 109228 1 S2S56 (2 - 283 of 345) hypothetical protein 
30 GP|2635109|eitib|CA 14605 . 1 1 | Z99117 alternate gene name: yrdP 

Gpj 1934657 lgb|AA 80908 . 1 1 |U93876 hypothetical protein Y2:dP 
PIR|E69725 I E69725 potassium uptake trkA - acillus subtilis 
%Match =6.1 

%Identity =33.2 %Similarity =56.0 
35 Matches = 77 Mismatches = 88 Conservative Sub.s = 53 



{ acillus subtilis} 
{ acillus subtilis} 
{ acillus subtilis} 



270 300 330 360 390 417 444 474 

YYC*LVKYFILHIYFCQGEDMKHYQTIIIGAGAAGIGFGSAMQRLGLTNFLIIEKGH-IGESFL-RWPRTTQFITPSFTT 
I Ihllll III I 1 = 1: = ! I :|lh 1= I = = = = 

40 MVDTIVIGRGQaGISIGYYLKQ-SDQKFIILDKSHEVQESWKDRYDSLVLFTSRMYSS 
10 20 30 40 50 

480 510 540 570 600 630 660 690 

NGFGFPDIOffliVIPDTSPAFSFEKEHLSGVEYARYLQLVAAHYlILPICJNETSVLSIDKRDSLFVIICrSKGDFS 

45 III I :: II: : :||| | |:|: | : |:|||:: :: 

LPGMHLEGEKHGFPSKNEIV AYLKKYVKKFEIEIQLRTEVISVLKIKNYFLIKIMREEYQ 

70 80 90 100 110 



ADYLIMATGEFQNPNTIDIKGaDLG MHYGQVDNF-HIKSDNPFIIIGGNESACDALTHLVYLGNQVELYTDTFGR 

l=:||l |: II I II :| I I I :== III i 
TKNLVIATGPFHTPNIPSIS-KDLSDNINQLHSSQYKNSKQLAYGNVLWGGGNSGA 



KESNPDPSISLS-PLTKERLKHIQ-DHKKEYYSISEGKKAI--EIKQIG 

:: hill: :: :| |: : ||::| ::| 

QIAVELSKERVTYLACSim,VyFPLMIGKRSIFWWFDKLGVLHASHTSIVGKFIQKKGDPVFGHELKHAIK 

180 190 200 210 220 230 240 

1068 1098 1128 1158 1188 1218 1248 

KQYQVTFDDGSTAESFHKPILSTGFLNTCHLIDGIALFEYDKNQLPIVTEDDESTIVNNCFLIGPSL 

II I I II I : I HII I |: ::: : : : : 

QKEI ILKKRVIAAKQNEIIFKDSSTLE-VNNI IWATGFRNPLCWINIKGVLDQEGRI IHHRGVSPVEGLYFI6LPWQHKR 
260 270 280 290 300 310 320 
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SEQ ID 8974 (GBS284) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 10; MW 42.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 9; MW 67.6kDa). 

GBS284-GST was purified as shown in Figure 225, lane 7. 
Example 2086 

A DNA sequence (GBSx2201) was identified in S.agalactiae <SEQ ID 6457> which encodes the amino 
acid sequence <SEQ ID 6458>. This protein is predicted to be NrgA-like protein. Analysis of this protein 

sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



INTESRAL 
INTEGRAL 
INTEGRAL 



have an uncleavable 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



-term signal seq 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 102 ( 82 - 109; 
■ 340 ( 318 - 342] 




Transmembrane 



281 ( 265 - 282; 



- Final Results 

bacterial membrane Certainty=0. 5692 (Affirmative) ■ 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
bacterial cytoplasm Certainty=0 .0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9997> which encodes amino acid sequence <SEQ ID 9998> 
was also identified. 

The protein has homology with liie following sequences in the GENPEPT database. 

>GP:CAB15668 GB:Z99122 ammonium transporter [Bacillus subtilis] 
Identities = 105/378 (27%) , Positives = 181/378 (47%) , Gaps = 41/378 (10%) 

VKKGLFVFLLLCILSMWLMIFGVAFYYFGSLH-QSLTSRIIYQFVLTVLLTTTAWFMGAY 61 
++ G VF+ C L G+A +Y G + +++ S ++ F ++ + + W + Y 

MQMGDTVFMFFCALLVWLMTPGLALFYGGMVKSKNVLSTAMHSFS - SIAI VS IVWVLFGY 5 9 





3 


Sbjct: 






62 


Sbjct: 


60 




108 


Sbjct: 


120 


Query: 


168 


Sbjct: 


179 


(3uery: 




Sbjct: 


236 


CJuery: 


283 


Sbjct: 


296 



LL V W LVYTP+A+ +W 



+-K3 LDF+GG +VH+S+G++ +LA V 



+ K F DD + +FG++GIGG G -I 



(3uery: 327 ATTILLSIIMTYIISKAI 344 

A T + I+T++1 K + 
Sbjct: 356 AATYVFVFIVTFVIIKIV 373 
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No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8975> and protein <SEQ ID 8976> were also identified. Analysis of this 



protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
MoG: Diecrim Score: 17.19 
GvH: Signal Score (-7.5): -4.07 

Possible site: 24 
>» Seems to have an uncleavable N-term signal seg 
ALOM program count: 9 value: -11.73 threshold: 0. 

Likelihood =-11.73 Transmembrane 

Likelihood = -6.42 Transmembrane 

Likelihood = -6.42 Transmembrane 3 

Likelihood = -5.26 Transmembrane 2 

Likelihood = -5.10 Transmembrane 1 

Likelihood = -1.49 Transmembrane 2 

Likelihood = -1.17 Transmembrane 1 

Likelihood = -0.43 Transmembrane 

Likelihood = -0.00 Transmembrane 2 

Likelihood - 0.26 152 
modified ALOM score: 2.85 



INTEGRAL 
INTEC3RAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



207 - 229) 



246 - 263; 
183 - 199; 



* Reasoning Step: 3 

— Final Results 

bacterial 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 5692 (Affirmative) < succs 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



30 The protein has homology with the following sequences in the databases: 

ORP01800(307 - 1332 of 1641) 

egad) 19589 1 BS3646(1 - 373 of 404) probable ammonium transporter {Bacillus subtilis} 
0MHI|NT01BS4254 ammonium transporter SP1q07429|NRGA._BACSU PROBABLE AMMONItJM TRANSPORTER 
(MEMBRANE PROTEIN NRGA) . GP| 143264 |gb|AaA17399. 1 | | L03216 membrane-associated protein 
35 {Bacillus subtilis} GP| 1684645 |emb| CAB05374 . l| | Z82987 unknown {Bacillus subtilis} 

GP|2636176|emb|CAB15668.l| |Z99122 aimnonium transporter {Bacillus subtilis} 

PIR|A36865 |A36865 ammonium transporter nrgA - Bacillus subtilis 
%Match =13.5 

%ldentity =30.0 %Similarity =54.8 
40 Matches = 104 Mismatches = 149 Conservative Sub.s = 86 



PFSMIRKFVSPNRCMREPKPIPARPAPIIMV**CFMSSP*QK*MCKIKniTS*Q*YSLTNKRVFVKKGLFVFLIjLCILSM 

:: I lh: = l I = 
MQMGDTVFMFFCJiLLV 



384 411 441 471 501 531 

VMIFGVAFTyFGSLH-QSLTSRIIYQFVLTVLLTnAWFMGRYFLRFEGHFKTVFQFQEADGKQI 

50 III |:|::| | : : : : | = : I = = = = I = II I I = == I 1 = 

WLMTPGLRLFYGGMVKSKNVLSTAI^lSP-SSIAIVSIVKra^FGVTLAFAPGNSIIGGLEmGLKGVGF 

30 40 50 60 70 80 90 



579 609 639 669 699 729 759 789 

55 VNCLFQLCFALYAVVMiIGSIIDRVQTKRLLIAWSWLFLVYTPIAYlIWNSEGVFAKMGVLDFSGCMIVHLSAGL 

: :||: ||: := 1= :|:: =11 I I lllll = |: =1 I :=] llhll :|b|:|:: = 
LFMMFQMTFAVLTTAIISGAPAERMRFGAFLLFSVLWASLVYTPVAHWWGG-GWIGQLGAIDPAGGNVAmiSSGU^ 
110 120 130 140 150 160 170 

60 819 849 873 ' 903 933 963 993 1023 

rAHVIGKSEHQHNKVKNDSLF--LGMILITFGWFGEtJMGPVGEWNSQAIMILIJSITIFAIIGGGLAWTL?^^ 
II hll : :: =: II II llllllhl = h ::|| I | | | :| :: I 

lAlVLGKRKDGTASSPHHLITTFLGGALIWFGWFGFNVGSALTIiDGVAMYAFINTlCTAAAlVGIJ^ 

190 200 210 220 230 240 250 
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1050 1080 1110 1140 1170 . 1200 . 1230 1260 

-SIimGIIVGLVTSTJM3VGYLLTWQLLAVTPFASLFTyFVTDYmcaFAIDDWSSE!^ 

::| I III I |:: = : : :: | | || :||::|||| I = III = = 

5 LGAVSGaiaGLVAITPAACSFVTPFASIIIGIIGGaVCFWGVFSLKKKEGTOnaLDAFCSUIGlGGTWGGI^ 

270 280 290 300 310 320 330 



1272 13 02 1332 1362 1392 1422 1452 

V QLLALATTILLSIIMTYIISKAIFRK**IRLRCTSQPYLLF*QGE*LNRIINHFHY*TLS3aC* 

10 |::|:| | :: |:|::| | : 

SAGADGLFYGDASLIWKQIVAIAATYVFVFIVTFVIIKIVSIiFLPLRATEEEESLGLDLTMHGEKAYQDSM 
350 360 370 380 390 400 



Based on this analysis, it was predicted that tihiese proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 



Example 2087 

A DNA sequence (GBSx2202) was identified in S.agalactiae <SEQ ID 6459> which encodes the amino 

acid sequence <SEQ ID 6460>. This protein is predicted to be dUTPase (dut). Analysis of this protein 

sequence reA'eals the following: 

20 Possible site: 51 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty= 0.2 731 (Affirmative) < suco 

25 bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9471> which encodes amino acid sequence <SEQ ID 9472> 
was also identified. 



30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA72644 GB:Y11901 dUTPase [Lactococcus lactis] 
Identities = 67/144 (46%) , Positives = 90/144 (61%) , Gaps = 8/144 (5%) 

Query: 40 RGFELVSQB'SNKELLPKRETAHAAGYDLKVABCKTVIEPGEITLVPTGIKAYMQPGEVIiYL 99 
35 RGF+ + +P+R T H+AGYD+ ++ I+P EI +V TG+ + EVL L 

Sbjct: 3 RGFK---KLDGNATIPERATKHSftGYDISASETVTIQPDEIKMVSTl3LAVQIiGDDEVLKL 59 

Query: 100 YDRSSNPRKKGlVLINSVGVIDGDYYNNQVNEGHIEaQMQKriTDQRVILEEGERIVQAVF 159 
YDRSSNP K+GI LINSVG+ID DYY + NI+ + V + +G+RI+Q VF 

40 Sbjct: 60 YDRSSNPVKRGIALIHSVGIXDSDYYPQEFK— 



Query: 160 APFLIADDDQATGMRTGGFGSTGK 183 

H-L ODD A G RTGGFGSTG+ 
Sbjct: 115 VKYIiTIDDElNANGKRTGSFGSTGE 138 

A related DNA sequence was identified in S.pyogenes <SEQ ID 646 1> which encodes the a 
sequence <SEQ ID 6462>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2519 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 115/148 (77%), Positives = 125/148 (83%) 

Query: 36 MSKVRGFELVSQFSNKELLPKEETAHAAGYDLKVAKKTVIEPGEITLVPTGIKAYMQPGE 95 

M+K+RGFELVS F+N +i:.LPKRET HAAGYDL VA+ I ?GEI LVPTG+KAYMQ GE 
Sbjct: 1 MTKIRGFELVSSFTNPDLLPKHETTHAAGYDLSVRKAOTIAPGEIKLVPTGVKAYMQDGE 60 

Query: 96 VLYLYDRSSNPRKKGIVLINSVGVIDGDYYNHQVNEGHIFAQMQNITDQAVILEEGERIV 155 

VLYLYDRSSNPRKKGI+LINSVCSVID DYY N+ NEGHIEAQMQNITD V L GEHIV 
Sbjct: 61 VLYLYDRSSNPRKRGIILINSVGVIIWJYYGNE7iNEGHIE2^MQNITDHPVTLAVGERIV 120 

Query: 156 QAVFAPFLIiRDDDQATOMRTGGPGSTGK 183 

Q VF PFL+AD DQR G RTGGFGSTG+ 
Sbjct: 121 QGVFMPFLIADGDQARGERTGGFGSTGQ 148 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2088 

A DNA sequence (GBSx2203) was identified in S.agalactiae <SEQ ID 6463> which encodes the amino 
acid sequence <SEQ ID 6464>. This protein is predicted to be RadA homolog (radA). Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2628 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



++SI S R KT + EFNRVLGGGW GSLVLIGGDPGIGKSTLLLQVS QL+ + 
PITSIETSEEPRVKTQLGEFHRVLGGGWRGSLVLIGGDPGIGKSTLLLQVSAQLSGSSN 1 

+VLY+SGEES +Q ia:iR++RLG + ++ +ET+M+ I S P F+++DSIQT+ 

SVLYISGEESVKQTKIfiaDRLGINNPSLHVLSETDhffiYISSAIQEMNPSFVVV^ 1 

SPEVSSVQGSVSQVREVTAEIMQrjAKTimiATFIVGHVTKEGTIAGPRMLEroiTOTVLYF 2 

+++S GSVSQVRE TAELM++AKT I FIVGHVTKEG++AGPR+LEHMVDTVLYP 
QSDITSAPGSVS(3VRECTAEIMKIAKTKGIPIFIVGirraKEGSIAGPRIj:.EHMVinV^ 2 

EGERmXFRILRAVKOTiFGSTNEIGIFEMQSGGLVEVIiNPSQVFLEERLDGATGSAIVVT 2 
EGERHHTFRILRAVKKRFGSTNE+GIFEM+ GL EVnNPS++PLEER G+ GS+I + 
EGERHHTFRILRAVKNRFQSTNEMGIFEMREEGLTEVENPSEIFLEEJtSaGSafiSSITAS 3 

MEGTRPIiaEVQALVTPTVFGKAKRTTrai^ENRVSLimVLEKRaSLLLQNQDA^ 3 
MEGTRPIL E+QAL++PT FGN +R TG+D NRVSL+MAVLEKR SLLLQNQDAYIiK A 
MEGTRPII.VEIQALISPTSFGNPRRMATGIDHNRVSLLMAVLEKRVGLLLC2NC3DAYLKVA 3 

GGVKIOEPAIDIAVRVAIASSYKEKPTNPQESPIGEIGLTGElRRVTRIEQRINEaSKLG 4 
GGVKLDEPAIDriA+ ++IASS+++ P NP + FiaE+GLTGE+RRV+RIEQR+ EA+KLG 
L GGVKLDEPAIDLAXVISIASSFRDTPPNPADCFIGEVGLTGEVRRVSRIEQRVKEAAKLG 4 

Query: 417 FTKIYAPKNSLAQIEIPKGIDVIGVTTVSQVLK 449 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 




117 


Sbjct: 


121 




177 


Sbjct: 


181 




237 


Sbjct: 


241 




297 


Sbjct: 


301 


Query: 


357 


Sbjct: 


361 


Query: 


417 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6465> which encodes the amino acid 
sequence <SEQ E) 6466>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results. 

bacterial cytoplasm — - Certainty=0. 2191 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiiity=0. 0000 (Not Clear) <t suco 

An alignment of the GAS and GBS proteins is shavm below. 

Identities = 416/453 (9X%) , Positives - 441/453 (96%) 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 






121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 




Sbjct: 


241 


Query: 


301 


Sbjct: 


301 




361 


Sb j ct : 


361 


Query: 


421 


Sbjct: 


421 



MaKKK+ P CQECGYQSPKyLGRCENCSAWSSFVEEVEV+EVKNARVSL CSIKSRP KLK 



VSGEESAEQIKLRSERLGDIDNEFYL'XftETNMQ+IR+EIE IKPDFLI1DSIQT1MSP++ 



h VQGSVSQVREVTAELMQLAKIOTiaTFIVGHVTKEGTUiGPRMLEHMVDTVLYF^^ 



YAPKN+L GI+IP+GI+V+GVTTV QVL AVFS 
YAPKKALQGIDIPQGIEVVGVTTVGQVENAVFS 453 

Based on Ihis analj^is, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2089 

A DNA sequence (GBSx2204) was identified in S.agalactiae <SEQ ID 6467> which encodes the amino 
acid sequence <SEQ ID 6468>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi= 0.34 8 8 (Affirmative) < suco 

bacterial menibrane Certainty=0. 0000 (Wot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA97750 GB:Z73419 hypothetical protein Rvl284 [Mycobacterium 
tuberculosis] 

Identities = 69/162 (42%), Positives = 100/162 (61%), Gaps = 2/162 (1%) 

Query: 3 TYFDNFDKIHQaYMIflGTAHLFIKPKTKVAIVTCMDSRLHVAQALGLALGDAHILRNAG 62 

T D++ti N YA LP+ P +AIV CMD+RL V + LG+ G+AH++RNAG 

Sbjct: 2 TVTDDYLANNVDYASGF-KGPLPMPPSKHIAIVACHDARLDVYRMLGIKEGEAHVIRNAG 60 

Query: 63 GRVTDDVLRSLVISQQQLGTREIWLHHTDCGAQTFTNEAFAAQLQRDLGVDMHGHDFLP 122 

VTDDV+RSL ISQ+ LGTREI++LHHTDCS TFT++ F +Q + G+ 
Sbjct: 61 CVVTDDVIRSIiaiSQRLLGTREIIIiI.HHTDCGMLTFTDDDFKRAIQDETGIRPTWSP-BS 119 

Query: 123 ETOIEESVEEDVAKIiHaSPLimDWISGMYITO3TGEMVEV 164 

+ D E VR+ + ++ +P + + G ++DV TG++ EV 

Sbjct: 120 YPDAVEDVRQSLERIEVNPFVTKHTSLRGEWDVATGKLNEV 161 

There is also homology to SEQ ID 6470: 

Identities = 126/164 (76%) , Positives = 146/164 (88%) 

Query: 1 MTTYFDNFLKTNQAYADLHGTflHLPIKPKTKVaiOTCm)SRIiHVaQfiLGIiALGDAHILRN 60 

+ +YF++P+ NQAY LHGTaHLP+KPKTK^ffilVTCmSRLHVAQfiLGIALGD^^HILF!N 
Sbjct: 1 LMSYFEHFMaflNQAYVALHGTAHXiPLKPKTKVaiWCMDSRLHVAQALGrMiGnM 60 



Query: 121 LPFNDIBESVREDVAKLHASPLIPDDWISGAIYDVDTGRMVEV 164 

LPF D+B+SVRED+AK+ AS LI DDWI+GA+YDVDTG+M +V 
Sbjct: 121 LPFQDVEDSVREDMAKIRASSLISDDWINGAVYDVDTGKMTQV 164 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2090 

A DNA sequence (GBSx2205) was identified in S.agalactiae <SEQ ID 6471> which encodes the amino 
acid sequence <SEQ ID 6472>. Analysis of this protein sequence reveals the following: 

N-temdnal signal sequence 

Final Results 

bacterial cytoplasm Certaiiity=0 . 0535 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic add sequence <SEQ ID 9473> which encodes amino acid sequence <SEQ ID 9474> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:AAC73407 GB:AE000137 putative oxidoreductase [Escherichia coli K12] 
Identities - 199/438 (45%) , Positives = 286/438 (64%) 

Query: 1 MKKYDVIVLGFGK&GKTLRaKIATQGKSVAMVEEDDKMYGGTCINIGCIPTKTLLVSaSK 60 

M KY +++GFGKAGmiA LA G VA++E+ + MYGGTCINIGCIPTKrL+ A + 
Sbjct: 10 MJKYC^VIIGFGKAGKTLAVTLAKAGWRVALIEQSNftMYGGTCINIGCIPTKTLVHnAQQ 69 



Query: 61 NHDFQEAMTTRl!^EVTSRLRAKNPA^aJDNKIWVDVYlM^ 120 
+ DP A+ +HEV + LR KNF L + +DV + +A PI+N + + E+ 
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Sbjct: 70 HTDFVRAIQRKNEV\'NFLRNKNFHKLJiJDMPNIDVIDGQflKFINNHSLRVHRPEGI^ 129 







121 


5 


Sbjct: 


130 




Query: 


IBl 


10 


Sbjct: 


190 


Query: 


241 




Sbjct: 


250 


15 








Sbjct: 


310 




Query: 


361 


20 








Sbjct: 


370 








25 




430 



• I INTGA++V PIPG+ + VYDST + L LP LGI+GGG IG+BFA++++ 



GSKVT++H-+ S 



L+ DA+L A+GR+P T L EN I + ERGAI VD+ T+ +NI+-A+GDV GG QFT 



YISLDD RIV + L + S +R VP S F PPL+ VG+ E+ A+E G 



V+A+PRA V ND RG+ K +VD +T +LGA L +SHE+INH- M MD +PY+ 



There is also homology to SEQ ID 1820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2091 

A DNA sequence (GBSx2206) was identified in S.agalactiae <SEQ ID 6473> which encodes the amino 
. acid sequence <SEQ ID 6474>. This protein is predicted to be glutamyl-tRNA synthetase (gltX). Analysis 
of ttiis protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm — Certainty^O. 224 5 (Affirmative) < suco 

bacterial membrane — Certaintyi=o. 0000 (Not Clear) < suco 

bacterial outside — Certainty^O. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9475> which encodes amino acid sequence <SEQ ID 9476> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10953> which encodes amino 
acid sequence <SEQ ID 10954> was also identified. 

45 The protein has homology Avith the following sequences in the GENPEPT database. 

>GP:AAC31971 GB:U49789 glutanyl-tRNA synthetase [Bacillus subtilis] 
Identities = 273/491 (55%) , Positives = 353/491 (71%) , Gaps = 19/491 (3%) 

Query: 20 LAMKIRVRYAPSPTGLLHIGNARTALEimiYaRHHGGDFVIRIEDTDRKRHVEDGERSQL 79 
50 + N++RVRYAPSPTG LHIGNaRTALENYL+AR+ GG P+IR+EDTD+KR++E GE+SQL 

Sbjct: 1 MGNEVRVRyAPSPTGHLHIGNARTALENYLFARNQGGKFIIRVEDTDKKRNIEGQEQSQL 60 

Query: 80 ENIJlVin^mWDESPET- - -HENYRQSERLELYQRYIDQLLaEGKAYKSWTEEEIAAERE 136 
L+WLG+DWDES + + YRQSER ++y+ Y ++LL +G AYK Y TEEEL ERE 
55 Sbjct: 51 NYLKWLGIDWDESVDVGGEYGPYRQSERiroiYIWYYEELLEKGLAYKCYCTEEELEKERE 120 

Query: 137 RQELAGETPRYmEFIGMSETEKEAYIAEREAAGIIPTVRLAVNESGIYKSmMVKGDIE 196 
Q -GE PRY + +++ E+E +IAE G P++R V E + + D+VKG+I 
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Sbjct: 121 EQIARGEMPRYSGKHRDLTQEEQEKPIAE GRKPSIRFRVPEGKVIAFNDIVKGEIS 17S 

Query: .197 FEGSNIGGDWVIQKKIX3yPTyNFAWIDDHDMQlSHVIRGDDHIMrePKQLMVYE?iLGWE 25S 

FE IG D+VI KKDG PTYNFAV IDD+ M+++HV+RG4-DHI+NTPKQ+M+Y+A GW+ 
Sbjct: 177 FESDGIG-DFVIVKKDGTPTWAVAIDDYLMKMTHVLRGBDHISNTPKQIMIYQAFGWD 235 

Query: 257 APQFGHMTLIINSETGKKLSKRDTNTLQFIEDYRKKGYMSEAVEKFIALLGWOT^ 316 

PQFGHMTLI+N E+ KKLSKRD + +QPIE Y++ GY+ BA+ENFI LLGW+P GEBE+ 
Sbjct: 236 IPQFGHMTLIVN-ESRKKLSKEDESIIQFIEQYKELGYLPEALFNFIGLLGWSPVGEEEli 294 

Query: 317 FSREQLIlttFDEaSRI^KSPAAFDQKKMDVMSiroYLKlMFESVFALCKPFLEEAG 373 

F++EQ I +FD NRLSKSPA FD K+ W++N Y+K D + V Ii P I1++AG++ 
Sbjct: 295 FTKEQFIEIFDVNRLSKSPALFDlfflKIOTIVlimQYVKKIJJIJDC^^ 354 

Query: 374 TDKREKLVELYQPQLKSftDEIVPLTDLFFiffiFPELTEaEKEVMAaETVPTVLSAF 428 

+ KL+ LY QL EIV LTDLFF D E + K V+ E VP VLS F 
Sbjct: 355 LSAEEQEWVRKLISLYHEQLSYGAEIVELTDIiFFTDEIEYNQERKAVLEEEQVPEVLSTF 414 

Query: 429 KEKLVSLSDEEFTRDTIFPQIKaVQKBIGIKGKISniiFMPIRIAVSGEMHGPELPDTIYLIfi 488 

KL L EEFT D I IKAVQKETG KGK LEMPIR+AV+G+ HGPELP +1 L+G 
Sbjct: 415 AAKIiEEL--EEFTPnNIKASIKAVQKETGHKGKKLFMPIRVRVTGQTHGPELPQSIELIG 472 

Query: 489 KEKSVQHIDMM 499 

KE ++Q + N+ 
Sbjct: 473 KETAIQRLKNI 483 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6475> which encodes flie amino acid 
sequence <SEQ ID 6476>. Analysis of this protein sequence reveals the following: 
Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1966 (Affirmative) < suco 

bacterial metnbrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 434/481 (90%) , Positives = 459/481 (95%) 
Query: 

Sbjct: 

Query: 80 ENLRWLGMDWDESPETHENYRQSERLELYQRYIDQLLAEGKAYKSYVTEEELAAERERQE 139 

ENL+WLGMDWDESPETHENYRQSERL LYQ+yiDQLLAEGKAYKSYVTEEELAAERERQE 
Sbjct: 61 ENLKWLGMDWDESPETHENYRQSERLALYQQYIDQLLAEGKAYKSYVTEEELAAERERQE 120 

Query: 140 LAGETPRYINEFIGMSETEKEAYIAEREAAGIIPTVRLAVNESGIYKWTDMVKGDIEFEG 199 

AGETPRYINEFIGMS EK YIAEREAAGI+PTVRIAVNESGIYKWTDMVKGDIEFE6 
Sbjct: 121 AAGETPRYIIIIEFICMSaDEKaKYIAEREaAGIVPTVRIAVNESGIYKW^ 180 

Query: 200 SNIGGDWVIQKKDC5YPTYNFAWIDDHDMQISHVIRGDDHIANTPKQLMVYEALGWEAPQ 259 

NIGGDWIQKKDGYPTYlJFAW+DDfflMQISHVIR6DDHIAOTPKQIMVYEALGWEAP+ 
Sbjct: 181 GNIGGDWVIQKKDGYPTYNFAWVDDHDMQISHVIRGDDHIANTPKQLMVYEALGWEAPE 240 

Query: 260 FGHMTLIINSETGKKLSKRDTNTLQFIEDYRKKGYMSEAVFNFIALLGWNPGGEEEIFSR 319 

FGHMTLI INSETGKKLSKRDTNTLQFIEDYRKKGYM EAVFNFIALLGWNPGGEEEIFSR 
Sbjct: 241 FGHMTLIINSETGKKLSKRDTNTLQFIEDyRKKGYl.l?EAVFNFIALLGWNPGGEEEIFSR 300 

Query: 320 EQLINLFDENRLSKSPAAFDQKKiyiDtmSNDYLKNADFESWALCPCPFLEEAGRLTDKAE^ 379 

BQLI LPDEHRLSKSPA!VFDQKKMDWMSN+YLK+ADPE+V+ALCKPFLEEAGRLT+KaEK 
Sbjct: 301 EQLIiU^FDENRLSKSPAaFDQKKMDMMSNEYLKHRDFEIVYALCKPFIiEEM 360 



Query: 380 LVELyQPQLKSADEIVPLTDLFFADPPELTEAEKEVMAAETVPTVLSAPKEKLVSLSDEE 439 
65 LVELY+PQLKSMEI+PLTDLFF+DFPELTEAEKEAMA ETV TVL AFK KL ++SDE+ 
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Sbjct: 361 LVELYKPQLKSADEIIPLTDLFFSDFEEIiTEAEKEVMaGETVSTVLQAFKAKLEAMSDED 420 

Query: 440 FTRDTIFPQIKAVQKETGIKGKNLFMPIRIAVSGE^mGPEr^OTIYLLGKEKSVQHID^JMI, 500 

F + IFPQIKAVQKBTGIKGKKILFMPIRIAVSGEMHGPELP+TI-niLG++KS++HI HML 
Sbjct: 421 FKPEMIFPQlKAVQKETGIKGKrmFMPIRIAVSGEMHGPELPNTIYLLGRDKSIEHIKNML 481 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2092 

A DNA sequence (GBSx2207) was identified in S.agalactiae <SEQ ID 6477> which encodes the amino 
acid sequence <SEQ ID 6478>. This protein is predicted to be d-ribose-binding protein precursor , fragment 
(rbsB). Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> May be a lipoprotein 

Final Results 

bacterial meiribrane Certainty^o. 0000 (Not Clear) < suco 

bacterial outside CertalntytaO . oooo (Not Clear) < suco 

bacterial cytpplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15613 GB:Z99122 ribose ABC transporter (ribose-binding 
protein) [Bacillus subtilis] 
Identities = 143/301 (47%), Positives = 205/301 (67%), Gaps = 1/301 (0%) 



DR +E GKV T VASDNV G+MAA + KLGK AK EL GVPGASAT +RG GFH++ 





14 


Sbjct: 


5 






Sbjct: 


65 


CJuery: 


134 


Sbjct: 


125 




194 


Sbjct: 


185 




254 


Sbjct: 


244 



A KL +++ QSA+FDR K L ■)-N++QGH D+Q +FA NDEMALGA +A+ S-K3 +++ 
ADQKLQVVTKQSADFDRTKGLTVMENTJliQGHPDIQAVFAHNDEMA]^^ 243 

L++G DG DA +IK +SAT+AQQP +G++A +AA D GKKV+K +P+ L T+ 
LVIGFDGNKDAIASIKDRKLSATVRQQPELIGiOJiTEaaDDIIJIGKKVQia'ISAPLKLETQ 3 04 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6478 (GBS203) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 12; MW 36.8kDa). 

GBS203-His was purified as shown in Figure 208, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefial antigens for 
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Example 2093 

A DNA sequence (GBSx2208) was identified in S.agalactiae <SEQ ID 6479> which encodes the amino 
acid sequence <SEQ ID 6480>. This protein is predicted to be galactoside ABC transporter, permease, 
protein (rbsC). Analysis oif this protem sequence reveals the following: 
Possible site: 14 

»> Seems to have no N-temdnal signal sequence 
INTEGRAL Likelihood =-11.3 

INTEGRAL Likelihood = -3.66 Transmembrane 111 - 127 ( 110 - 128! 
INTEGRAL Likelihood = -2.'. 
INTEGRAL Likelihood = -2. A 
INTEGRAL Likelihood = -O.f 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 545 8 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < i 

• Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9287> which encodes amino acid sequence <SEQ ID 9288> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB15612 GB:Z99122 ribose ABC transporter (permease) [Bacillus siibtilis] 
Identities = 144/211 (68%), Positives - 182/211 (86%), Gaps = 1/211 (0%) 

Query: 1 M(MiNGLFISYGKLAPFI\mATMriFRGRTBVYSN6NPITAGLSDSFLFQFU3QGYIVG 60 
25 +GM+NGL 1+ GK+APFI TLATMT+PRG TLVy++GtJPIT GL ++ FQ G+GY +G 

Sbjct: 113 LGMINQLLITKiQKMAPFIATLATMTVFRGLTLVYTDGNPIT-QLGTNYGFQMPGRaYFI.G 171 

(Juery: 61 IPFPVILMFLrPIILYILLHKTAFGKSV3(ALGGNEK2UiYISGIKIJ!IKV^ 120 
IP P I M L F+IL+H-LLKKT FG+ YA+GGNEKAA ISGIK+ +VK++IY+++G+++ 
30 Sbjct: 172 IPVPAlT^miRFVII*^VLLHmPFGRRTYAIGGlffiKRALISGIKVTRVKVM^YSLAGXlLS 231 

Query: 121 SISGLIITSRLSSAQPTAGasyEMDAIAAWLGGTSLSGGRGRIIGTLIGALIIGVLNNG 180 

+++G I+TSRL SAQPTAG SVE+DAIAAWLGGTSLSGQ+GRI+GTLIG LUG LNNG 
Sbjct: 232 ALAGAILTSRLHSAQPTftGESyELDAIAAWLGGTSLSGGRGRIVGTLIGVLIIGTLNNG 291 

35 

Query: 181 LNIIGVSAFWQQWKGIVILMAVLLDRFFWA 211 

LN++GVS+F+Q WKGIVIL+AVLLDR K A 
Sbjct: 292 LNLLGVSSPYQLWKGIVILIAVLLDRKKSA 322 



40 A related GBS gene <SEQ ID 8977> and protein <SEQ ID 8978> were also identified. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefial antigens for 
vaccines or d 



Example 2094 

A DNA sequence (GBSx2209) was identified in S.agalactiae <SEQ ID 648 1> which encodes the amino 
acid sequence <SEQ ID 6482>. Analysis of this protein sequence reveals the following: 



possible site: 35 
>» Seems to have n< 

INTEGRAL Likelihood 
INTEGRAL Likelihood 



-0.64 



jnal sequence 
Transmembrane 
Transmembrane 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0. 1447 (Affirmative) ■ 
■- Certainty=0. 0000 (Not clear) < i 
— Certainty=0. 0000 (Not Clear) < ! 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2095 

A DNA sequence (GBSx2210) was identified in S.agalactiae <SEQ ID 6483> which encodes the amino 
acid sequence <SEQ ID 6484>. This protein is predicted to be ribose transport ATP-binding protein rbsa 
(rbsA). Analysis of this protein sequence reveals the following: 

. Possible site: 35 
»> Seems to have no N-tenninal sicpial sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 401 - 417 ( 401 - 417) 

Final Results 

bacterial membrane Certainty=0. 1001 (Affirmative) < suco 

bacterial outside Certainty-0 . 0000 (Not Clear) <' suco 

bacterial cytoplasm Certaintyi=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:CaB15511 t3B:Z99122 ribose ABC transporter (ATP-binding protein) 
[Bacillus subtilis] 
Identities = 297/493 (60%) , Positives = 375/493 (75%) , Gaps = 1/493 (0%) 

MKIDMRMISKSFGXWKVLEKIDLELQSGQIHALMGKNGAGKSTLMNILTGLFPASTGTIY 60 
M+I+M++I K+FG N+VL + +L G++HM,MGENGAGKSTLMNILTGL A G I 
MQIEMKDIHKTFGKNQVLSGVSFQLMPC-E\'HALMGENGaGKSTLMNirjTGLHKADKGQIS 60 





1 


Sbjct: 






SI 


Sbjct: 


61 


(2uery: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sb j Ct : 


241 




300 


Sbjct: 


301 




360 


Sbjct: 


361 




420 


Sbjct: 


421 




480 


Sbjct: 


481 



I+G E FSNP+E1AE+ GI+PIHQE+N WPEMTVLENLF+G+EI H 



B F +L V++ IiD 6 SVGQQQMIEIAK+L+ +++MDEPTARLT+RE LF 



VI LK+ GV +VYISHRMEEIF I D +T+MRDG VDT S T4- DE+VKKMVGR+L 



+FEDVSFYVR GEI+G SGMGAGRTE+MR +FG+ 



I P +A+K+G+GF+TENRKDEGL+LD +I++N+ LP+ 



FV LI RL IK+ P+ +LSGGNQQKW+AKWIGI PKVLILDBP 



TRGVDVGAKREIY LMNEL +RGV I+MVSS+LPEILG+SDRI+V+HEGRISGE+ +EA 



QB++M IATGS+ 



There is also homology to SEQ ID 4678. 
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SEQ ID 6484 (GBS407d) was expressed in E.coli as a GST-fiision product. SDS-PAGE analysis of total 
cell extract is shown in Figure 147 (lane 2-4; MW 72kDa). It was also expressed in E.coli as a His-fiision 
product. SDS-PAGE analysis of total cell extract is shown in Figure 147 (lane 5 & 6; MW 47kDa). 

GBS407d-His was purified as shown in Figure 235, lane 9-10. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2096 

A DNA sequence (GBSx22n) was identified in S.agalactiae <SEQ ID 6485> which encodes the amino 
acid sequence <SEQ ID 6486>. This protein is predicted to be high affinity ribose transport protein rbsd 
10 (rbsD). Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm — Certaintyi=0. 2673 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protem has homology with the following sequences in the GENPEPT database. 

20 >GP:CaBlS610 GB:Z99122 ribose ABC transporter (membrane protein) 

[Bacillus subtilis] 
Identities = 74/131 (56%) , Positives = 95/131 (72%) , Gaps = 1/131 (0%) 

Query: 1 MKKTGIlJSSHIJUaJUDDLGmDRVCIGDI^FVIMGIPKIDLSLTSGIPSF^ 60 
25 MKK GimSHIAK+ DIiSEm)++ I D GLPVP+G+ KIDLSL G+P+FQ+ + E 

Sbjct: 1 MKKHSimSHLAKIIiADI^TDKIVIftDRGLPVPDGVLKIDLSLKPGLPAFQDTA^ 60 

Query: 61 NILVEKVIIMEIKEaNPDQLSRIiAKLDNSVSIEYVSHim.KQMTQDVKAVIRTGENTP 120 
+ VEKVI A EIK +N + ++ L L + lEY+SH K +T+D KAVIRXGE TP 
30 Sbjct: 61 EMAVEKViaaaEIKASNQEN-AKFLENIiFSEQEIEYLSHEEFKLLTKDAKkVrRTGEFTP 119 

Query: 121 YSNIILQSGVI 131 

Y+N ILQ+GV+ 
Sbjct: 120 YANCILQAGVL 130 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccmes or diagnostics. 

Example 2097 

40 A DNA sequence (GBSx2212) was identified in S.agalactiae <SEQ ID 6487> which encodes the amino 
acid sequence <SEQ ID 6488>. This protem is predicted to be ribokmase (rbsK). Analysis of this protein 
sequence reveals the following: 

Possible site: 47 

>» Seems to have an uncleavable N-term signal seg 

45 

Final Results 

bacterial membrane Certainty=o. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

50 
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The protein has homology with the following sequences in the GENPEPT database; 

= 4/293 (1%) 

MSNIVIIGSISMDLVMETNRIAKEGETVFGQRPSMVPGGKGflNQAVAIGRLSQERDNITI SO 
M NI +IGS SMDLV+ +++ K GETV G F VPGGKGANQAVA RL + + + 
MiO^ICVIGSCSMDLWrSDKRPKAGETVLGTSFQTVPGGKGAMQAVAAARLGAQ- - -VFM 57 

I^IGEDSFGPILLDNmiCNHVTTDFVGTIP-SSSGVAQITLYtDIDNRIIYCPGANGKVD 119 
+G +G+D +G +L+NL W V TD++ + + SG A I L DN 1+ GAW + 



Query: 




sb:ct. 




Query: 




Sbjct: 


58 




120 


Etojct: 


118 




180 


Sbjct: 


178 


Query: 


240 


Sbjct: 


238 



I++ D+V++Q EIP + ++ +C H I ^ 



h lA YP KL +T G +G YS C 



TGiAGDTEN AF A+++ I ALRFA AA LSV FGAQGGMPT E+E+ 



No coiresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2098 

A DNA sequence (GBSx2213) was identified in S.agalactiae <SEQ ID 6489> which encodes the amino 
acid sequence <SEQ E) 6490>. Analysis of this protein sequence reveals the foUovraig: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm — Certaintyi=0. 2272 (Affirmative) < suco 

bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ E) 9477> which encodes amino acid sequence <SEQ ID 9478> 

was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15608 GB:Z99122 transcriptional regulator (Lad family) 
[Bacillus subtilis] 
Identities = 141/327 (43%) , Positives = 204/327 (62%), Gaps = 4/327 (1%) 

Query: 13 MSTIRQVAEKAGVSTSTVSRYISQKIGYVSQKASQKIEQAIREIJreVHSrFLAQSLKTKKNQ 72 

M+TI+ VA AGVS +TVSR ++ NGYV ++ ++ A+ +L+Y EN +A+SL ++++ 
Sbjct: 1 MRTIKDVAG»J«WS\fATVSRKna©NEfVHBETRTRVIARJ^^ 60 

Query: 73 LVGIiLPDISNPFFPRIJaiGVEEFrJKEQGYRVMIiGNTONKSHLEEEYLNVLLQSN^ 132 

L+GLLLPDI+NPFFP+LARG E+ L +GYR++ GN-H+ + E EYIi Q++ AGII 
Sbjct: 61 LIGI.LLPDITNPFFPQIJ«GAEDE]:J]REGYRIiIFGMSDEELKKELEYLQTFKQI!n^^ 120 

Query: 133 — TTHDFTKNHPEIDIPVVVVDRVNQETQYGVFiSDNKEGGKIiRAQaiWTAGaTNILLIRG 190 

T + + + ++ PW +DR E V SD G KLAAQAI + I L+RG 
Sbjct: 121 AATim>DLEEYSCayiNYPVVFLDR-TLEGAPSVSSDGYTOVKLaAQAIIHGKSQRITLLRG 179 

Query: : 



wo 02/34771 



PCT/GBOl/04789 



P RF G+ L F + ++ASF + Q AK L +P D -t-IA + 

Sbjct: 180 PA-HIiPTAQDRFNC3ALEIIiKQftEVDFQVIETASFSIKDflQSMAKELERSYPATDGVi;^ 238 

Query: 251 DIHAIAYLHEILNRGKRIPEIWQIiaYDDIIiMSQFIYPSLSTIHQSS'iflMGQKAAELIFK 310 

DI A A LHE L RGK +PED+QIIGTDDI S ++P LSTI Q +y MG++AA+I1+ 
Sbjct: 239 DIQAAAVLHEALRRGKHVPEDIQIIGYDDIPQSGLLFPPLSTIKQPAXDMGKEAAKLLLG 298 

Query: 311 ITOQLPITNKRIKLPVHYVEaRETLRRK 337 

1 + P4- I++PV Y+ R+T R++ 
Sbjct: 299 IIKKQPIiRETAIQMPVTYIGRKTTRKE 325 

A related DNA sequence was identified in S.pyogenes <SEQ ID 649 1> which encodes the amino acid 
sequence <SEQ ID 6492>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1657 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/328 (70%) , Positives = 274/328 (82%) 

Query: 10 GVSMSTIRQVAEKAGVSTSTVSRYISQNGYVSQKASQKIEQAIRELHYVPNFLAQSLKTK 69 

G +M TI+QVAE+AGVS STVSRYISQ GYVS A KI+ AI tLHY PN LAQSLKTK 
Sbjct: 14 GKAMVTIKQVAEEAGVSRSTVSRYISQKGYVSDDARHKIKAAIAKLHYTPNVLAQSLKTK 73 

Query: 70 KNQLVGLLLPDISNPFPPRIARGVEEPLKEQGYR\M^GNTIJNKSHLEEEYLNVLLQSNAA 129 

KNQLVGLLLPDISNPPPPRLARG EE+LKE+CSYRVMLGN ++ LEEEY++VLI,QSNAA 
Sbjct: 74 KNQLVGriDLPDISNPFFPRLARGREEYLKEKGYRVmiGNISDSEALEEEYVITOiLQSISM 133 

GIITTHDFTK +P + IPVVVVDRV+QETQYGVFSDN+ GG LABQ +W AGA +LLIR 
Sbjct: 134 GIITTHDFTKRYPTIAIPWVVDRVDQETQYtWFSDNRAGGLriAAQTVWQAGAKEVIjLIR 193 

Query: 190 GPLDKADNLNQRFQGSQNYiraKGACFAIEDSASFDFAEIQIEAKTLLDHHPDIDSIIAP 249 

GPLD A+N+N+RF+ S +YL + + DS +FDF IQ+EA L +P IDSIIAP 

Sbjct: 194 GPLDNAENINERFEASFSYLQKQDVTMYVCDSQNFDFESIQLEASYNLKCYPTIDSIIAP 253 

Query: 250 SDIHAIAYLHEILNRGKRIPEDVQIIGYDDILMSQFIYPSLSTIHQSSYIMGQKAAELIF 309 

SDIHAIAY+HE+ ++GK+IP+DVQIIGYDDII*!SQFIYPSIiSTIHQSSY+MG+ AAEL++ 
Sbjct: 254 SDIHAIAYIHELHSQGKKIPQDVQIIGYDDILMSQFIYPSLSTIHQSSYIMGRYAAELVY 313 

Query: 310 KITNQIjPITNKRIKLPVHYVERETLRRK 337 

I +QL + RIKLPVHSVERET+R++ 
Sbjct: 314 TIASQLTVKAMRIKLPVHYVERETIRKR 341 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2099 

A DNA sequence (GBSx2214) was identified in S.agalactiae <SEQ ID 6493> which encodes the amino 
acid sequence <SEQ ID 6494>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>» Seems to liave no W-terminal signal sequence 

IMTBGRAL Likelihood =-13.80 Transmembrane 27 - 43 ( 24 - 51) 

INTEGRAL Likelihood =-10.61 Transmembrane 337 - 353 ( 329 - 362) 

INTEGRAL Likelihood = -9.18 Transmembrane 257 - 273 ( 249 - 276) 

INTEGRAL Likelihood = -8.92 Transmembrane 302 - 318 ( 291 - 326) 
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Final Results 

bacterial membrane Certainty=0 . 6519 (Affirmative) < suco 

bacterial outside --- CertaintY=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8979> which encodes amino acid sequence <SEQ ID 8 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 4 

Peak Value of DR: 3.20 
Net Charge of CR: 1 

McG: Discrim Score: 5.06 

GvH: Signal Score (-7.5): 0.0500002 

Possible site: 46 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 47 
ALOM program count: 3 value: -10.61 threshold: 0.0 

INTEGRAL Likelihood =-10.51 Transmembrane 326 - 342 ( 318 - 348) 
INTEGRAL Likelihood = -9.18 Transmembrane 246 - 262 ( 238 - 265) 
- INTEGRAL Likelihood = -8.92 Transmembrane 291 - 307 ( 280 - 315) 

PERIPHERAL Likelihood = 4.98 152 
modified ALOM score: 2.62 
icml HYPID: 7 CFP: 0.525 



Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12525 GB:AE001863 hypothetical protein [Deinococcus radiodurans] 
Identities = 103/352 (29%) , Positives = 191/352 (54%) , Gaps - 9/352 (2%) 

Query: 15 AWKELTFYKKKYLLIELLIIVMMFMWFLSGLANGLGRAVSAAIENNPAQTYIHvIEGAEQ 74 

A +EL K + LLI ++ ++ FMV L+GL GL R ++ + + PAQ+++ + A+ 
Sbjct: 4 ALRELQHQKLRSLLIGGIVALIAFm'FMLTGLTRGLSRDSASLLLDTPAQSFVTTKEADG 63 

Query: 75 VITSSVLTTKDQTDLNSLNLKDSTTLNIQRSSDTRQGHEKKIDISYFAIDKDSFMAPTLS 134 

V+ S L+ + 4-++L + ++ ++ +K++ +D F+AP +S 

Sbjct: 64 VLNRSFLSPEQ---VSALQQDNEDAAAFAQTFVSFSHGDKQLSGVLLGVDPRGFLAPDVS 120 

Query: 135 EGKQLTSYKKAIILNDSLKAEGIKLGDKVIDKSSSISLTWGFVHNSMYGHGPVAFIDKD 194 

EG+ L A++ ++SL+ +G+K+GD + K S L V GF ++ H P ++ 
Sbjct: 121 EGQTLRVAGGAW-DESLREDGVKVGDVLTLECPSGDQLRVSGFTRSRRLNHQPGMYVSLA 179 

Query: 195 lYTEIinCKINPQyQFLPQALVMKNDKSISHLP-TQLEAVSKKDVIQHIPGySaEQSTLNM 253 

+ +K+NP+ A+ + + H-L L ++ +Q +PGy EQ H-L M 

Sbjct: 180 RW QKMIPRmGTVNAVaLPflaPAQVHIiGGMLSVTNRAQTLQVLPGyKEEQGSLTM 235 

Query: 254 ILVIVLVVRSAGILGVFFYIITLQKRHEFSVMKAIGTKMSEIALFQiLSCVIIIALFGIIVG 313 

I L+ +A +L FFY++TLQK +F ++KRIG +A ++Q++IL L + + 

Sbjct: 236 IQVFLIAVaAFVLATFPYVMTLQKTAQFGLLKRIGASNRTLAGSWAQMLILTLLAVAIA 295 

Query: 314 DGLAVRLSyVLPAQMPFVINWQNIILVSFVFLVIAMlSSALSIVKVaKIDPV 365 

+ + + +LPA MPF + NI S + LV+A ++S LS+ +VAK+DP+ 
Sbjct: 296 AAVTLGMVQLLPAGMPFHLTAANIASASGLLLWAALASLLSVRRVAKVDPL 347 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6495> which encodes the amino acid 
sequence <SEQ ID 6496>. Analysis of this protein sequence reveals the following; 



Possible site: 58 
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»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.31 Transmembrane 246 - 262 ( 233 - 270) 

INTEGRAL Likelihood = -8.49 Transmembrane 327 - 343 { 321 - 351) 

INTEGRAL Likelihood = -1.01 Transmembrane 301 - 317 ( 301 -'317) . 

Final Results 

bacterial membrane Certainty=0. 5925 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm' — Certaiinty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AftF12525 GB:AE001863 hypothetical protein [Deinococcus radiodurans] 
Identities - 101/360 (28%) , Positives - 175/360 (48%) , Gaps = lX/360 (3%) 

Query: 1 MFLAIiNEMKQSKLRYGLIAGLLCLVAYLMFFLSGLAFGLMQEaSRSAVDLWKADSVLIiRKD 60 

M+IAL E++ KIiR LI G++ L+A+++F L+GL GL +++ S + A S + K+ 
Sbjct: 1 MYLALRBLQHQKLRSLLIGGIViUJIAFMVFMLTGLTRGLSRDSASIiIiLOTPAQSFV^ 60 

Query: 61 ADATLTLSQVSRRQENQITADK?aPLAQUm;AWSVKHPKIM)IWKVSLPGIDSNSFIRP 120 

ADLS+SQ + +D AT K VL G+D F+ P 

Sbjct: 61 ADGVLNRSFLSPEQVSALQQD1JEDAAAFAQTFVSFSHGDKQLSGV---LLGVDPRGFIAP 117 

Query: 121 NIVKGRLFKTNKEWLDQSLAKEEAFAIGKDFYTSSSSQALTIVGYTQNARFSVAPWyM 180 

++ +G+ + V+D+SL +E+ +G S L + G+T++AR + P +Y+ 

Sbjct: 118 DVSEGQTLRVAGGAWDESL-REDGVKVGDVLTLKPSGDQLRVSGFTRSARMHQPGMYV 176 

Query: 181 NLEAFETLKyGEPLPKDKQVVMAFITKGS--LTDYPKKDFQKLDIKTFITKLPGYSAQLL 238 

+L ++ L P+ VMA + + D + + LPGY + 

Sbjct: 177 SLARWQKLN ERMHGTVNAVALPAAPAQVNICGADLSVTNEAQTLQVLPGYKEEQG 231 

Query: 239 TFGFMISFLVIISAIIIGIFMYILTIQKAPIFGIMKaQGISNKTITTAVLMQTFFLSFLG 298 

4- + FL+ +.t-A ++ F Y++T+QK FG++KA G SN+T+ +V+ Q L+ L 
Sbjct: 232 SLTMIQVTliIAVAAFVLATFFYVOTLQKITiQFGLLKAIGASNRTIAGSVVAQm^ 291 

Query: 299 SGLGLLGTWLTSLLLPTWPFQSNWFLYIAIFVSMICFALLGTLFSVFNIIRIDPLKAIG 358 
+ T LLP +PF + ++ A L +L SV + ++DPL A+G 

Sbjct: 292 VAIAAAVTLGMVQLLPAGMPFHLTAANIASASGLLLWAALaSLLSVHRVAKVDPLIALG 351 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 96/356 (26%) , Positives = 178/356 (49%) , Gaps = 4/356 (1%) 

Query: 15 AWKELTFYKKKnjLIELLIIVimFMVVFIiSGLKNGLGRAVSaAIENNPAQTYIIiNEGaEQ 74 

A E+ K +Y LI I.+ ++ +++ FLSGLA GL 4- +A++ A + +L + A+ 
Sbjct: 4 AIJSEMKQSKUlYGLIAGLLCLV&YLMFPLSGLAFGLMQEISlRSAVDLWKaDSVIi^^ 63 

Query: 75 VITSSVLTTKDQTDIiNSIiIimKDSTTimQRSSLTRQGHEKKIDISYFAIDKDSFiy^ 134 

+T S ++ + + + + ms S+ K+ +S F ID +SF+ P + 

Sbjct: 64 TLTLSQVSRAQENQITADroaPIAQLimffiWSVKNPKDADKVKVSLFGIDSNSFIRPN^ 123 

Query: 135 EGKQLTSYKKAIILNDSLKAEGIKLGDKVIDKBSSISLTVVGFVHNSMYGHGPVAFIDKD 194 

+6+ + K+ ++ K E +G SSS +LT+VG+ H+ + PV +++ + 

Sbjct: 124 KGRIIFKTNKEVVLDQSLaKEEAFAIGKDFYTSSSSQALTIVGYTQISB«FSVAPVVY^ra^ 183 

Query: 195 lYTEIN-KKINPQYQPLPQALVMKNDKBISHLPTQ-LEAVSKKDVIQHIPGYSAEQSTLH 252 

+ + +P+ + +A+K S++P+++K1 +PGYSA+ T 
Sbjct: 184 AFETLKYGEPLPKDKaVVI!aFITKB--SLTDYPKKDFQKIiDIiCrFlTKLPGYSflQLLTFG 241 

Query: 253 MILWVLVV2iSAGIL3VPFyiITLQKRHEFSVMKAIGTKMSEIALFQLSQVIILBLFGIIV 312 

++ LV+ SA I+G+F YI+T+QK F +MKA G I L Q L+ G + 

Sbjct: 242 FMISFLVIISAIIIGIFMyiLTIQKAPIPGIMKAQGISNKTITTAVLMQTPFLSFLGSGL 301 

Query: 313 GDGLAVALSYVLPAQMPFVIHWQNIILVSFVFLVIAMISSALSIVKVAKIDPVEVI 368 

G S +LP +PF NW + + + A++ + S+ + +IDP++ I 

Sbjct: 302 GLLGTWLTSLLLPTWPFQSNWFLYLAIFVSMICFALLGTLPSVFNIIRIDPLKAI 357 
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SEQ ID 8980 (GBS239) was expressed in E.coli as a GST-fusion product SDS-PAGE analysis of total cell 
extract is shown in Figure 175 (lane 13; MW 64kDa). 

GBS239-GST was purified as shown in Figure 227, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2100 

A DNA sequence (GBSx2215) was identified in S.agalactiae <SEQ ID 6497> which encodes the amino 
acid sequence <SEQ ID 6498>. This protein is predicted to be heterocyst maturation protein (devA) 
(b0879). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certaintyi=0. 1751 (Affirmative) < auco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 3 AILELKHISKHYPDGDBLLSILDNLDLSVSAGEFVAILGPSGSGKSTLLSIAGLLLGMQ 62 

A+4 +K ++ +Y G Ili +++L + GE V + GPSGSGK+TLLS+ G L + 

Sbjct: 5 AVIAIKSLNHYYGKGALKRQILFDINLEIYPGEIVIMTGPSGSGKTTLLSLIGGLRSVQE 64 

Query: 63 GSLYTOHENVTDLSQRQRTQLRREftLGFIFQSHQLLPYLTIQEQLQQEARFAKHYDKKTS 122 

G+L ++ SQ + Q+RR ++G+IFQ+H LL +LT ++ +Q +H ++ + 

Sbjct: 65 GNLQFLGVELSGASQNKLVQIRR-SIGYIFOaHISnXGFLTARQNVQMAVEaaffiHISQEEA 123 

Query: 123 LEEINKliSDIfilEaCfiHKYIWQI^GGQKQRAAIARAFINHPKVIIJmEFrASIiDEERGR 182 

+ + +L +G+E YP+ LSGGQKQR AIARA +N+P ++LaDEPTA+LD++ GR 

Sbjct: 124 lAKAEAMLKAVGLENRVDYYPDNLSGGQKQRVAIARALVNNPPLVEMEPTAALDKQSGR 183 

Query: 183 QVTELIRQEVKSHNTAAIMVTHDERVLDLVDTVXRLKDGKLVICEN 227 

V B++++ K T+ ++VTHD R+LD+ D + ++DG L +++ 
Sbjct: 184 DWEIMQRLAKDQGTSILLVTHDNRILDIADRIVEMEDGILARDS 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6499> which encodes the amino acid 
sequence <SEQ ID 6500>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=»0 .4181 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 103/224 (45%) , Positives = 149/224 (65%) , Gaps = 4/224 (1%) 

Query: 3 AILELKHISKflYPDGDEIiSIUDNLDLSVSAGEFVAILGPSGSGKSTLLSIAGIiLLGADQ 62 

++L K ++K + DG ++ L D S+ AGBFVRI+GPSGSGKST L+IAG L 
Sbjct: 3 SVLTFKQVTKTFQDGHHEINALKATDFSIEAGEFWAIIGPSGSGKSTFIiTIAGGLQTPSS 62 

Query: 63 GSLYVNHENVTDLSQRQRTQLiaiEALGFIFQSHQLLPYLTIQEQlXXJEARPAKHYDKICrS 122 
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G Ii ■++• + T LS+++R++IjR +++GFI Q+ I1+P+ T+Q+QL+ H 
Sbjct: 63 GQLIIDGTDYTHLSEKERSRLRFKSVGFILQASNLIPFSTVQQQIjE LVDHLTGSKE 118 

Query: 123 LEEINKLLSDI^IBQaUIKrPNQLSGGQKQRAAIARRFIlffiPKV-ILADEPTftSIJJEERGR 182 . 

+ N+L DLGI H+ P +LSGG++QRAAIARA + P +ILADEPTftSIiD E+ ' . 

Sbjct: 119 KRKANQLFDDLGITGLKHQLPQELSGGERQRA&IARM,raDPALlLADEPTASLOTEK&Y 178 

Query: 183 QVTEIiIRQEVKSHNTAAIiyOTmERWiDLVDTVYRLKIXSK^ 226 

+V +L+ +E K N A IMVTHD+R+L D VYR++DG+L +E 
Sbjct: 179 E^nmlllMESKEKl«ailI^WTHDDRmJKSCI)KVyRMQDGELCQE 222 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefbl antigens for 
vaccines or diagnostics. 

Example 2101 

A DNA sequence (GBSx2216) was identified in S.agalactiae <SBQ ID 6501> which encodes the amino 
acid sequence <SEQ ID 6502>. Analysis of this protein sequence reveals the following: 

N-terminal signal secpience 

20 Final Results 

bacterial cytoplasm — Certainty^O. 2645 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside CertaintysO. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB64972 GB:AJ012050 VicR protein [Enterococcus faecalis] 
Identities = 86/229 (37%) , Positives = 132/229 (57%) , Gaps = 10/229 (4%) 



Query; 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 

Sbji 



3 KILVVEnNIVQQKIITTKLTQEayQFITASNGQEALNCLDTEEVQLIITDI^1MP^IMD6YQ 62 

KILWh-D +1+ I. +EGY+ TA +G+EftL ++ E LI I D+M+P MDG + 

52 KILWDDEKPISEIVOMLVKEGYEVFTAYDGEEALEKVEEVEPDLIILDLMLPKMDGLE 111 

63 LIQELRSAAYNVPIIYMTAKSQMEDMTKGFGLGADDYMVKPVQLQELALRIKaLLRR--- 119 

+ +E+R +++P1I++TAK D G LGADDY+ KP +EL R+KA IjRR 
112 VAREVRK-THDMPIIMVTAKDSEIDKVLGLELGADDYVTKPFSNRELVARVKMJLRRGAT 170 

120 ANIVAQHQLIIGNTCLNEDELSLKYFEQEIIFPQKEFRVLFHLLSYPNRIFTRLEL 175 

A + Q +L IG+ ++ D + ++I +EF +L++Ii + ++ TR L 
171 NAKEAEVTTQSELTIGDLTIHPnAYMVSKRGEKIELTHREFELLYYLAKHIGQVMTRBHL 230 

176 UJSIWGMDTDLDERWDACINKIRRKVEHLPDFK— lETVRGVGYRAKN 222 

L ++WG D D R VD + ++R K+E P + T RGVGY +N 

231 LQTVWGYDYFGDVRTVDVTVRRLREKIEDSPSHPTYLVTRHGVGYYIiRN 279 



There is also homology to SEQ ID 1 182. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2102 

A DNA sequence (GBSx2217) was identified in S.agalactiae <SEQ ID 6503> which encodes the amino 
acid sequence <SEQ ID 6504>. This protein is predicted to be sensor protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 38 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.97 Transmembrane 53 - 69 ( 47 - 77) 
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Final Results 

bacterial membrane Certainty=0 .4588 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

the protein has homology with the following sequences in the GENPEPT database. 

>GP:ARC62214 GB:AF049873 sensor protein [Lactococcus lactis] ■ 
Identities = 97/307 (31%) , Positives = 169/307 (54%) , Gaps = 16/307 (5%) 

Query: 57 iSaLAWFLSLVIASISMWYGSYHLTKPILDISHIVSNVaiXSDFEGHIYENSHERKSYEYY 116 

+ IAV4- • +L++ + S++Y + +T+P-I-L I +A GD + N+ 
Sbjct: 170 AVIAVI--TLIVTAFSimTRTVTRPLLKIKLGTDKIAQGDLSIQLNVMTE 219 

Query: 117 IffiLDELSBSlNQMIVSLSHMDHMRKDFITNVSHEtjKrPIJ^i^^ 176 

+EL EL++S1 + L M R +F+++V+HEL+TP+ + ++ E ++ 

Sbjct: 220 r 



Query: 237 NFQLDSKPYTVYSNSDLLM--QVWINLLDNAIKYSEDIVDLSVRMEETNNHYLRVIISDK 294 

NF L S Y+N D + QV +NLL NA KYS D D+ + ++ +++ISDK 

Sbjct: 340 NF-LISC3EGKFYANIDFMRIEQVLVN1,IjMNAYKYEADESDIK1AFIPEKENF-KIVISDK 397 

Query: 295 GRGISQYDVQHIFDKFYQADQSHNQQ- -GNGISLAIVKRIIVLCKGRISVSSQLEIGTEF 352 

G GI + D+ +IF++FY+ D+S + G GIiGLAIV+ 1+ G+I V S GT F 
Sbjct: 398 GEGIPEQDLPYIFERFYRVDKSRTRTTGGVGISLAIVQDIVKKHNGiaiVESIQNQGTTF 457 

Query: 353 CVELPLS 359 

+EI1P S 
Sbjct: 458 IIELPYS 464 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8981> and protein <SEQ JD 8982> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site; -1 Crend: 10 
McG: Discrim Score: 4.84 
GvH: Signal Score (-7.5): 0.179999 

Possible site: 35 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -8.97 threshold: 0.0 

INTEGRAL Likelihood = -8.97 Transmembrane 50 - 66 ( 47 - 77) 
PERIPHERAL Likelihood = 1.27 324 
tnodified ALOM score: 2.29 

*** Reasoning Step: 3 

Final Results 

bacterial membrane — C3ertainty=0. 4588 (Affirmative) < suco 

bacterial outside — Certainty=0, 0000 (Not Clear) < suco 

bacterial cytoplasm — CertaintyaO . 0000 (Not Clear) < suco 

The protein has homology with the foUowiag sequences in the databases: 

31.9/57.3% over 293aa 

Lactococcus lactis 

GP 1 3687664 1 sensor protein Insert characterized 
ORF01881(47B - 1377 of 1677) 

GP|3687664|gb|AAC62214.l| |AP049873(171 - 464 of 464) sensor protein {Lactococcus lactis} 
%Match =12.9 
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%Ideiitity =31.9 %Sindlarity =57.3 

Matches = 94 Mismatches = 121 Conservative Sub.s = 75 



fTFFQGVLTTHVLQVSALAWFLSLVIASI SMWYGSYHLTKP 
I :: : : : \ : : ■.■.]■.: : \::\ : :|:| 

EKKNKKESLHFHWUSDKYIVSKSRIQSNGKIVGSVYMFI^TRPIQKMVENFTGIFAVIiAVITIjIVTAFSIFYITRTVTR^ 



I =:= 1 II Ihl : == =:: = : I | : | | : : =: : Mill I I 

lANRSTTSLEDKTQYmilREESRHLTQIMDIJlNLaQI^ENGFKVEKHQVLIQELlNEWS^^ 
20 280 290 300 310 320 330 340 

1059 1083 1113 1143 1173 1203 1233 

PYTWSNSDLL--MQVWINIiiaDiaiKySEEIVDLSVRMEEraNH^ 

|:| h: II :||| II III 11== == -HIIII II = 1= Hl-lh 1 = 1 : 

25 EGNFYJ™IDFMRIE□VLV^mLMmYKYSADESDIKIAFIPEKENF-KIVISDKG^ 

3S0 370 380 390 400 410 420 

1287 1317 1347 1377 1407 1437 1467 1497 

-C3MGIX3IAIVKRIIVLCKGRISVSSQLEIGTEPCVELPLS*LFKTITANWQLLFYLFRMKYTKN^ 
30 I IIIIIM: I: |:| II II I Hil I , 

GGVGLGL&IVQDIVKKHNGKIIVESICJISIQGTTFIIBLPyS 
440 450 460 

SEQ ID 8982 (GBS170d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 181 (lane 4; MW 35kDa) and in Figure 123 (lane 5-7; MW 35kDa). It was also 
35 expressed in E.coli as a GST-fiision product. SDS-PAGE analysis of total cell extract is shown in Figure 
123 (lane 2-4; MW 60kDa) and in Figure 184 (lane 3; MW 60kDa). Purified GBS170d-GST is shown in 
Figure 243, lane 7; purified GBS170d-His is shown in Figure 234, lanes 5-6. 

Example 2103 

A DNA sequence (GBSx2218) was identified in S.agalactiae <SEQ ID 6505> which encodes the amino 
40 acid sequence <SEQ ID 6506>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 



Final Results 

45 bacterial cytoplasm Certainty=0 . 0502 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:BaB06906 GBiAPOOlSlS argininosuccinate synthase 

(citrulline-asparate ligase) [Bacillus halodurans] 
Identities = 262/396 (65%), Positives = 321/395 (80%), Gaps = 1/396 (0%) 

Query: 1 MGKEKLILAYSGGLDTSVAIAWLK-KD'fflDVIAVCMDVGEGKDLDFIHDKALTIGaiESYI 59 
55 M K+K++IjAYSGGLDTSVAI WL K YDVIAV +DVGEGKDL+F+ +KAri +GAIESY 

Sbjct: 1 MSKKKVVIAYSGGI£)TSVaiKWLSDKBYDVIAVGLDVGEGKDLEFVKEKALKVGAIESYT 60 



Query: 60 LDVKDBFAEHFVLPAUSaHAMYEQKYPLVSaiiSRPIIAQKLVEMaHQTGATTiaHGCTGK 119 
+D K EFAE FVLPALQRHA+YEQKYPLVSALSRP+I++KLVE+A QTGA +AHGCTGK 
60 Sbjct: 61 IDAKKEFAEEFVLPALQAHALYEQKXPLVSALSRPLISKKLVEIAEQTGAQAVaHGCTGK 120 
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Query: 120 GITOQTOFEVAIiUUliDPELKVIAPVREVinKIfflREEEITFAKaNGVPIPADLDKPYSIDQNLW 179 

GNDQVRFEV+I AL+P L+V+APVREW W R+EEI +KK N +PIP DLDNPYS+DQNLW 
Sbjct: 121 GiroQTOFEVSIQAIJIENLEVLRPVREmWSRDEEIEnfAKKHNIPIPIDI^^ 180 

Query: 180 GEJUmCGVI^PWNQZ^EEAFGITKSPEEZiPDCMYIDITFQiraKSIAINNQEMTI^ 239 

GR+NECG+IiE+PW PE A+ +T + E+APD E ++I F+ G P+ +N + + +LI 
Sbjct: 181 GRSKECGILEDPWATPPE6AYELTVAIEDAPDQPEIVEIGFERGIPVTIJJGKSYPVHELI 240 

Query: 240 LSUreiAGKHGIGRIDHVElTOIiVGIKSREIYECPAAMVLIiAAHKEIEDLTIjTOEVSHFK^ 299 

L LW+IAGKHG+GRIDHVENRLVGIKSRE+YECP AM L+ AHKE+EDLTL +EV+HFKP 
Sbjct: 241 LEMJQIAGKHGVGRIDHVENRLVGIKSREVYECPGAMTLIKAHKELEDLTLTKEVAHFKP 300 

Query: 300 ILENELSNLIYNALWFSPATKAIIAYVKETQKVVNGTTKVKLYKGSAQWARHSSNSIjYD 359 

++E +++ LIY LWFSP A+ A++KETQ V G +VKL+KG A V R S SLY+ 
Sbjct: 301 WEKKIAELIYEGLWFSPLQPALSAFLKETQSTVTGVWVKLFKGHAIVEGRKSEYSLYN 360 

Query: 360 ENLATYTAADSFDQDAAVGFIKLWGLPTQVNAQVNK 395 

E LATYT D FD +AAVGFI LMGLPT+V + VNK 
Sbjct: 361 EKLATYTPIMJEFDHlilAATOFISLMGLPTKyYSMVNK 396 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 



25 Example 2104 

A DNA sequence (GBSx2219) was identified in S.agalactiae <SEQ ID 6507> which encodes the amino 
acid sequence <SEQ ID 6508>. This protein is predicted to be argininosuccinate lyase (argH). Analysis of 
this protein sequence reveals the following: 

Possible site: 43 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB069G5 OB:AP001518 argininosuccinate lyase [Bacillus halodurans] 
Identities = 284/454 (62%) , Positives = 350/454 (76%) 

Query: 6 KLWGGRFESSLBKWVEEFGASISFDQKLAPYDMKASMainnm<3KTDIISQEEaGIiIKD6 65 

Sbjct: 

Query: 66 LKILQDKYRAGQLTFSISNEDIHMNIESLLTAEIGEVAGKLHTARSRNDQmTDMHIiYLK 125 

L IL +K + G+Ii +S++NEDIH+NIE LL EIG V GKLHT RSRNDQVATDMHLYL+ 
Sbjct: S3 LHILLEKAKKGELNYSVANEDIHLNIEKLLIDEIGPVGGKLHTGRSRNDQVATDMHLYLR 122 

Query: 126 DKLQEI^KKLLHLRTTLWIiABNHIYTVIvlPGYTHLQHAQPlSFGHHLMAYYTMFTRDTER 185 

+ +E+4-+ + +++ LV A+ H+ T++PGYTHLC AQPISF HHL+AY+ M RD R 
Sbjct: 123 KQTKEILQLVKWQAALVEQAKQHVETLIPGYTHIORAQPISFAHHI^YFWMLERDYGR 182 



Query: 246 SILMMHLSRFCEEIlHWCSYEYQFITLSDTFSTGSSIMPQKKNPDMaELIRGETGRVYGN 305 

S+LM HLSR CEE+I W S E+QF+ + D F+TaSSIMPQKKNPDMaELlRGKrGRVYG+ 
Sbjct: 243 SLLMTHLSRLCEELILWSSQEFQFVEMDDAFATGSSIMPQKKNPDMAELIRGKTORVYGS 302 
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Query: 306 LFSLLTVMKSLPIA^KDLQBDKEGMFDSVETVSIAIEIMAMvILETMTVNEHIMMTSTET 365 

LFSLLTV+K LPIA'XIJKD+QEDKEGMFD+V+TV ++ I A M4+TM V E M + 
Sbjct: 303 LFSLLTVLKGLPIAYNKDMQEDKEGMFDAVKTVKGSLAIFAGMIQTMICVKEETMTKAVHQ 362 

Query: 36S DFSNATELADYLASKGVPPRKAHEIVGKLVLECSKNGSYLQDIPLKYyQEISELIENDIY 425 

DFSNATEIADYLA+KG+PFR4AHK+VGKLVL C + G YIi D+PL Y+ S+L + DIY 
Sbjct: 363 DFSKRTELADYLATKGMPFREAHEWGKLVLLCIQKGIYLIiDLPLSDYKflASDLFDEDiy 422 

Query: 426 EILTARTAVKRRNSLGGTGFDQVKRQIIiLARKEL 459 

++L KT V RR S GGTGF +VKK I A K L 
Sbjct: 423 DVLQPKTVVARRTSAGQTGFTEVKKAIAKAEKIL 456 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2105 

A DNA sequence (GBSx2220) was identified in S.agalactiae <SEQ ID 6509> which encodes the amino 
acid sequence <SEQ ID 6510>. This protein is predicted to be class-II aldolase (fba). Analysis of this 
protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintys=0 .2930 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9289> which encodes amino acid sequence <SEQ ID 9290> 
was also identified. Analysis of this sequence reveals: 

GvH: Signal Score (-7.5): -2.92 

Possible site: 42 
»> Seems to iiave no N-terminal signal seq. 
ALOM program count : 0 value : 0.37 threshold : 0.0 
PERIPHERAL Likelihood =0.37 66 
modified ALOM score: -0.57 

*** Reasoning Step: 3 

Final Results 



bacterial cytoplasm — Certainty=a.2930 (Affirmative) < suco 
bacterial membrane -— Certainty=a. 0000 (Not Clear) < suco 
bacterial outside --- Certainty= 0.0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MAXVSAEKFVQAARDNGYAVGGFNTMSILEmQAILRAAEAKKaPVLIQTSr^^ 60 

MAIVSAEKF++AAR+NSYAVGGF^m^MLEWTQAILRAAEAKKAP+LIQTSlyK3AA 
Sbjct: 1 MAIVSAEKFIKAARENGYAVGGFN™^LEWTQAILRAaEAKKAPILIQTS^O«^^ 60 

Query: 61 KLCKQLIETLVESMGITVPVAIHLDHGHYDDAIECIEVGYTSIMFDGSHLEVEENLEKAR 120 

KLCK LIE LVESMGITVPVAIHLDHGH+4.DALECIEVGYTS+MFDGSHLPVEENLEKa+ 
Sbjct: 61 ia.CKTLIENLVESMGITTOVAIHLDHGHFEnAIiECIEVGYTSVMFDGSHLPVEENLEKAK 120 

Query: 121 EWAKAHAKGISVEAEVGTIGGEEDGIVGKGELAPIEDAKAMVETGIDFLAAGIGNIHGP 180 

EWAKAHAKG+SVEAEVGTIGGEEDGIVG GELAPIEDAKAMV TGIDFLAAGIGNIHGP 
Sbjct: 121 EWAKAHAKGVSVEAEVGTIGGEEDGIVGGGELAPIEDAKAMVATGIDFLAAGIGNIHGP 180 
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Query: 181 YPAMffiGLDLDHLKXLTEAVPGFPIVIfiGGSGIPDDQIQEAIKLGVAIOTNVHTECQLAFC 240 

YPiam+GI. LDHLKKLT AVPGFPIVLHGGSGIPDDQI4- AIKLGVAKVNVNTECQ+AF . 
Sbjct: 181 YPANWQGLHLDHLKKLTAAVPGFPIVLHGGSGIPDDQIKiiMKIGVAK^^ 240 

Query: 241 QA 242 
+A 

Sbjct: 241 KA 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 651 1> wliich encodes the amino acid 
sequence <SEQ ID 6512>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2930 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/242 (89%) , Positives = 228/242 (93%) 

Query: 1 NIAIVSAEKFVQAARDlSIOTAVGGFKTISnSILEWrQAIIjRAaEaUCK^^ 60 

NaiVSAEKFVQiiaR+NGYAVGGFKTMmiEVrrQA 
Sbjct: 1 MAIVSAEKFVQaaRENGmVGGFmMnjEVmSAIIJlAAEaK^^ 60 

Query: 61 EaXKQLIETLVESMGITVPVAIHLDHQEmDDALECIWGYTSIMFIXSSHLPVEEaSLBKAR 120 

K+C+ LI liVESMGITVPVAIHLDHGHY+naiiECIEVGYTSIMFDGSHLPVEEanii K 
Sbjct: 61 KVCQSLITNLVESMGITVPVAIHLDHGHVEDaLECIEVGYTSIMFDGSHLPVEENLAKTA 120 

Query: 121 EVVRKaHAKGISVEAEVGTIGGEEDGIVGHGELAPIEDAKaMVETGIDFLAaSIGNIHGP 180 

EW AHARG+SVEAEVGTIGGEEDGI+GKGELAPIEnAKAMVETGIDFLAAGIGNIHGP 
Sbjct: 121 EWKIAHAKGVSVEAEVGTIGGEEDGIIGKGELAPIEnAKAMVETGIDFLAAGIGNIHGP 180 

Query: 181 YPANVffiGLDIJJHLKKLTEAVPGFPIVIflGGSGIPDDQIQEAIKICVaKWVNTECQI^ 240 

YP 3SIMEGL LDHL+KUT AVPGFPIVIflGGSGIPDDQI+EAI+LGVAKVNVNTE Q+AF 
Sbjct: 181 YPENWEGLALDHLEKLTAAVPGFPIVIfflGGSGIPDDQIKEAIRICVAKraamBSQIAFS 240 

Query: 241 QA 242 
A 

Sbjct: 241 NA 242 

SEQ ID 9290 (GBS683) was expressed in E.coli as a GST-fosion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 150 (lane 8 & 10; MW 55kDa). It was also eiq)ressed in E.coli as a His-tusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 150 (lane 11-13; MW 30kDa) and in 
Figure 184 (lane 11; MW 30kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 



Example 2106 

A DNA sequence (GBSx2221) was identified in S.agalactiae <SEQ ID 6513> which encodes the amino 
acid sequence <SEQ ID 6514>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

■ Pinal Results 

bacterial cytoplasm Certainty=0. 2775 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certaiiity=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>,GP:A2\A88585 GB:Mie954 unknown protein [Streptococcus rautans] 
5 Identities = 109/229 (47%) , Positives = 156/229 (67%) , Gaps = 1/229 (0%) 





1 


MFSGKRLKiaailTLGYSQSELaDKtHINRSSYEflWEtffiKTKPNQSl^^ 


60 






MFS H-+bK+RR LG SQ++ MKL I+R SYENME RTKENQ NL +IA LL V Y 




Sb j ct : 


1 


MFSSQKLICERRKKLGLSQAQTMKIfilSRPSYEmiEIGKTKENQKOTiDKIML^ 


60 




61 


FESEYKIVNTYLQLSLOKQEKVEKYftEELLQTQKVHEKIVPLPAVEVLSEIQLSAGPGEG 


120 






F S++ IV Y +L+ N+ K KY++ Lln- Q ++ +LSAG G 




Sb j ct : 


61 


FLSQHDIVEIYTMjraSNKTKTI^SQHLLEQQDKKim.MKNKRYPYRVYEKLSAGTGYS 


12 0 




121 


LYDEFETETVYSEDEYTGFDIATWISGNSMEPVYKDGEVALIRSTSFDHDGAVYAIiNWNG 








+ + +TV+ ++E D A+WI G+SMEP++ +GEVALI+ TSFD+DGA+YA++W+G 




Sbjct: 


121 


YFGDGNFDTVFYDEEID-HDFASWIFGDSiyiEPIFIiNGEVALIKQTGFDYDGaiYAIDWDG 


179 


Query: 




SLYIKKDYREEDGFRMVSINPDVaERFIPFEDEIRIVGKIVGHFMPVIG 229 








YIKK+YREE G R+VS+N A++F P++4- RI+G IVG4-F+P+ G 




Sbjct: 




QTyiKKVYREETGLRLVSIilKKSaDKFAPYDENPRIIGLIVGNFIPLEG 228 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6515> which encodes the amino acid 
sequence <SEQ ID 6516>. Analysis of this protein sequence reveals the following: 

25 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4340 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 84/209 (40%) , Positives = 130/209 (62%) , Gaps = 9/209 (4%) 

Query: 25 LHINRSSYFNWENEKTKPNQSNLKQLAILI.DVPETYFESEYKIVNTYLQi:.SLQNQEKVEK 84 

LH+N+ + NWE K PN+ +L L L +V YF+ Y+++ Y QL++ N+EKV 
Sbjct: 5 LHVNKMTISIWEKGKNIPNEKHLNALLHLFNVTSDYFDPNYRLLTPYNQLTISNKEKVIG 64 

Query: 85 YAEELLQTQ KVHEKIVPLFAVEVLSEIQLSaGPGEGLYDEFETETVYSEDEYTG 138 

y+E LL Q + +K L+A V LSAG G -H + + V+ DE 

Sbjct: 65 YSERIinNHQIDKKSKDLIDKPSQLYAYRVYES--LSAGTGYSYFGIX3NFDVVFY-DEQi:iE 121 

Query: 139 FDIATWISGNSrffiPVYKDGEVALIRSTGFDHDGRWALNMNGSLYIKKLYREEDGFRMVS 198 

+D A+W+ G+SMEP Y +6EV LI+ FD+DGA+YA+ W+G YIKK++RE++G R+VS 
Sbjct: 122 YDFASWVPGDSMEPTYIiNGEWLIRQNSFDYDGAIYAVEVroGQTyiKKV^ 181 



Sbjct: 182 LNKKYSDKFAPySEEPRIIGKIiaNFRPL 210 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2107 

A DNA sequence (GBSx2222) was identified in S.agalactiae <SEQ ID 6517> which encodes the amino 
acid sequence <SEQ ID 6518>. Analysis of this protein sequence reveals the following: 

3 M-terminal signal sequence 
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Final Results r 

bacterial cytoplasm Certainty=0. 2387 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 

Example 2108 

A DNA sequence (GBSx2223) was identified in S.agalactiae <SEQ ID 6519> which encodes the ammo 
acid sequence <SEQ ED 6520>. This protein is predicted to be UmuC MucB homolog (uvrX). Analysis of 
this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no H-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certaintyi=0. 2195 (Affirmative) < suco 

bacterial tneitibrane — Certainty=0. 0000 (Mot Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ K) 9925> which encodes amino acid sequence <SBQ ID 9926> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



LHTSLCVMSRADNSMLIIJ^SPMFKKVFGKGNVGRAYDLPFDVHTRramRftKISGLP 98 
L LCVMSRADNSAGLILASSPMFKKVFGK NVGR+YDLPFDV TRKF+YY AK GLP 
LRLRLCVMSRADNSAGLILASSPMFKKVFGKSNVGRSYDLPFDVKTRKFSYYNAKKQGLP 64 



+V +IE WAK T IVP LI N-i-EIQK+FQ++A P DI PYSIDEGFIDLTS 



SLNYFV DKS+SRKDKLD++SA IQ IW KTG+YSTO35SNANPLLAKLAIJDNEAK T 



Query: 




Sbjct: 


5 


Query: 


99 


Sbjct: 


65 


Query: 


159 


Sbjct: 


125 


Query: 


219 


Sbjct: 


185 


Query: 


279 


Sbjct: 


245 




339 


Sbjct: 


305 




399 


Sbjct: 


365 




459 



TMRANWSYEDVE KVW IPKMTDFWGIG+R EKRL+ LGI+SIKELA +P ++KKE G 



+G++ WFHANGIDESNVH+PY+PK+ GIGNSQVL KDY +Q DIE++LREMAEQVA+RLR 



KKATW+I+H-GYS E K+SIN Q KI P N+T + + V+ LF H-KY G&+R++A 



V Y GIiVDE+F +ISLFDD E+ EKEE+L++ ID+IR PGF ++ K ++L + SR I+R 



wo 02/34771 



-2380- 



PCT/GBOl/04789 



Sbjct: 425 SKLIGGHSAGGLDGLK 440 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefixl antigens for 
vaccines or diagnostics. 

Example 2109 

A DNA sequence (GBSx2224) was identified in S.agalactiae <SEQ ID 6521> which encodes the amino 
acid sequence <SEQ ID 6522>. Analysis of this protein sequence reveals the following: 

N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4016 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology witli any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protem and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2110 

A DNA sequence (GBSx2225) was identified in S.agalactiae <SEQ ID 6523> which encodes the amino 
acid sequence <SEQ ID 6524>. Analysis of this protein sequence reveals Ihe following: 



--. Final Results 

bacterial cytoplasm Certainty=0. 2088 (Affirmative) < succ 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- CertaintY=0 . 0000 (Not Clear) <. suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MIDRSYLPFKVRREYQDRKM2UO#lGFFLSEEITAGia3SEIJIKTOYTSELSlSDKL^^ 60 

MIDRSYLPF+ AEEYQD KM KWMGFPLSEHT+ L + NKV Y S+LS+ KLLIJ:i+Q+ 
Sbjct: 1 MIDRSYLPFQSRREYQDTKMQromGFFLSEmSaLTDDftNKVTYMSDLSLEKKLLDLS^ 60 

Query: 61 YSNQLNGIIAVPGQ YySGKVDra,TENHVSLRrKTGFVSIPIKDILSIDL--EVEyE 114 

Y+ QIiN I V + Y+G + +LT + + +KT TG +++ +KDI+SI+L BV YE 

Sbjct: 61 YAGQIOTEIimaCKlOTQVSYTGTIPSLTKDFILIKTTTGHINLiaiKDIVSIEIiVEBVLYE 120 

Query: 115 SA 116 
SA 

Sbjct: 121 SA 122 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2111 

A DNA sequence (GBSx2226) was identified in S.agalactiae <SEQ ID 6525> which encodes the amino 
acid sequence <SEQ ID 6526>. Analysis of tibis protein sequence reveals the following: 

Possible site: 48 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4025 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9927> which encodes amino acid sequence <SEQ ID 9928> 
was also identified. 

The protein has no significant homology wifli any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

. Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2112 

A DNA sequence (GBSx2227) was identified m S.agalactiae <SEQ ID 6527> which encodes the amino 
20 acid sequence <SEQ ID 6528>. This protein is predicted to be soluble transducer HtrXIII. Analysis of this 
protein sequence reveals the foUowmg: 
Possible site: 56 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 5246 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protem and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2113 

35 A DNA sequence (GBSx2228) was identified in S.agalactiae <SEQ ID 6529> which encodes the amino 
acid sequence <SEQ ID 6530>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 5131 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences m tiie GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2114 

A DNA sequence (GBSx2229) was identified in S.agalactiae <SEQ ID 6531> which encodes the amino 
acid sequence <SEQ ID 6532>. This protein is predicted to be pX02-78. Analysis of this protein sequence 
reveals the following: 

no N-tenninal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 .2105 (Affirmative) < succ 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 27 SGQlYEHPDHDSPRIFADTNTEKWFSRDIQGDVIDPVQLVftSVSEKKaLSYLETG--aFE 84 

S + Y +HDS I N F W SR + G++I FVQ V SF A+ L G +E 
Sbjct: 39 SERYYlU.T,EHDSLIIDRKKNQFYWNSRGVNGNIIKFVQEVEnASFPGMQRtLDGEQDYE 98 

Query: 85 EAKVIEETYQPFQYYCjREEP FQQftRTYLKDIRGLSNQTINSFGRQGnLAQATyQaE 140 

+A I +P+ Y E+ F +AR YL + R + Q +++ +GL+ Q Y 

Sbjct: ! 



Query: 


141 


SVLVFKSFDHNGTLQAASLQGLVKNEEKYDRGYLKKIMKGSHGHVGISFDIGNPKRLIFC 


200 






+VL G + S QG+VK++ KYRG KIKS + G+ GP+LF 




Sbjct: 


157 


]m.FI:.WKDRETGAV^03SEQGWKSD-KYKRGAWK3IQKNSTANYGFNVIJ!^ 


215 




201 


ESVID^mSYYQLHQKQLSDVRIlIS^IEGLKLSVmYQTIJUJAAEEQGKIlAFLDTVKPIRLS 


260 






ES ID++SY LH+ L D LISMEGLK VI + 




Sbjct: 


216 


ESDIDLLSYATLHKHNLKDTHLISMEGLKPQVI FN 


250 


Query: 


261 


HYLQAIQETTTPFQTHSNVITMAVDNDEMREFYQKL SDRSFPIFQ-DLPPLQ 


312 






+Y++A + + +++ VnND+AG+ F ++L +D F+ + P 




Sbjct: 


251 


YYMKACERIGDV PDSLSLCVDNDKAGKAFVERLIHFRYEKNDGSIVAFKPEYPQRP 


306 


Query: 


313 


RLETKSDWNDIVKR 326 








E K DWND KR 




Sbjct: 


307 


SEEKKNDWNDECKR 320 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its qpitopes, could be'usefijl antigens for 
45 vaccines or diagnostics. 



Example 2115 

A DNA sequence (GBSx2230) was identified in S.agalactiae <SEQ ID 6533> which encodes the amino 
acid sequence <SEQ ID 6534>. Analysis of this protein sequence reveals the following; 

Possible site: 20 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 7013 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted tbat this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 2116 

A DNA sequence (GBSx2231) was identified in S.agalactiae <SEQ ID 6535> which encodes the amino 
acid sequence <SEQ ID 6536>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintj^O. 1310 (Affirmative) < suco 

bacterial membrane Certaintys=0 . 0000 (Not Clear) < euccs 

15 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 2117 

A DNA sequence (GBSx2232) was identified in S.agalactiae <SEQ ID 6537> which encodes the amino 

acid sequence <SEQ ID 6538>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
25 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .6726 (Affirmative) < suco 

bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 

30 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9373> which encodes amino acid sequence <SEQ ID 9374> 
was also identified. 

The protein has no significant homology wifli any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2118 

A DNA sequence (GBSx2233) was identified in S.agalactiae <SEQ.ID 6539> which encodes ftie amino 
40 acid sequence <SEQ ID 6540>. This protein is predicted to be phosphoglucomutase (manB). Analysis of 
this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2 14 7 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9355> which encodes amino acid sequence <SEQ ID 9356> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB9S418 GB;AJ243290 phosphoglucomutase [Streptococcus thermophilus] 
Identities = 391/465 (84%) , Positives = 424/455 (91%) , Gaps = 1/465 (0%) 

Query: 1 MAQHGIKSYVFEAIRPTPELSFAVRHLNAYAGIMVTASHNPAPFNGYKVYGQDGGQLPPA 60 

+A HGIKSYVFE+LRPTPELSFAVRHL+ +AGIM+TASHNPAPFWGYKVYG+DGGQ+PPA 
Sbjct: 107 tJ«UIGIKSYVEESLRPTPELSFAVRHIflTFAGIMITASHNPAPENGYKVYGEDGGQMPPA 166 

Query: 61 DaDALTDFIRAIENPFAVELMJLIJESKSSGLIQVIGEDVDIEyLREVKI^ 120 

DADALTD+IRAI+NPF V+LADL++SK+SGLI++IGE+VD EYL+EVKDVNINQDI.IN + 
Sbjct: 167 EjanaLTDYIRAIDNPFTVKLADLEDSKASGLIEIIGENVDAEyLKEVKI^ 226 

Query: 121 GKDMKIAm'PIjHGTGEMLTRRALaQftGFESVVVVESQaKaDPDFSTVKSPNPESQaAFAIi 180 

G+DMKIVYT LHGTGEML RRALAQAGF+4V WE+QA DP TVKSPNPE+Q AFAL 
Sbjct: 227 GRDMKIVYTSLHGTGKMLWRAIiAQAGFDAVQWEAQAVPHiyDFLTVKSPNPENQDAFAL 286 

Query: 181 AEELGREVDADVLVATDPDADRLGVEIRQPDGSYKNLSGNQIGAIIAKYILEAHKTAGTL 240 

AEELGR VDADVLVATDPDADRLGVEIRQPDGSY NLSGNQIGAIIAKYILEAIiKTAGTL 
Sbjct: 237 AEELGRIWDfiDVLVATDPDADRLGVEIRQPDGSYl^ILSGNQIGAIIAKYILEAHKTAGTL 346' 

Query: 241 PENAALAKSIVSTELVTKIAESYGATMFNVLTGFXFIAEKIQEFEEKHNHTYMFGFEESF 300 

P NAAL KSIVSTELVTKIAESYGATMENVLTGFKFI EKI EFE +HN+TYMFGFEESF 
Sbjct: 347 PANaALCKSIVSTELVTKIAESYGATMFNVLTGFKFISEKIHEFETQEJNYTYMFGFEESF 406 

Query: 301 GYLIKPFVRDKIMQAVIiLVABIAAYYRSEGLTLADGIDEIYKEyGYFAEKTISVTLSGV 360 

GYLIKPFVRDKmiQAVL+VAEIAAYYRSRG+TIiADGI+EIYK+yGYF+EKTISVTLSCSV- 
Sbjct: 407 CSYLIKPFVRDKDAIQAVLIVAEIAAYYRSRGMTLADGIEEIYKQyGYFSERTISVTLSGV 466 

Query: 361 a3aAEIKKI^©KFRENGPKQFNOTD^VLLEDFQKQTATKNDGTISNLTTPPSNVLm■^ 420 

DGAAEIKKIMDKFR N PKQFNNTDI EDF +QTAT DG + LTTPPSNVLKY LA 
Sbjct: 457 DGAAEIKKIMDKFRRNAPKQFNNTDIAKTEDFLEQTATTADG-VEKLTTPPSNVLKYILA 525 

Query: 421 DDSWIAVRPSGTEPKIKFYIATVGNDLADAETKIANIEKEITTFV 465 

DDSW AVRPSGTEPKIKFYIATUG AnR+ KIANIE EI FV 
Sbjct: 526 DDSWPAVRPSGTEPKIKFYIATVGBTEaEAKEKIANIEaEINAFV 570 

There is also homology to SEQ ID 6156: 

Query: 1 MAQHGIKSYVFEAIJlPTPELSFAVRHiaaYAGIMVTASHNPAPENGYKVYGQDGGQLPP^ 60 

+AQHGIKSYVEEAIiRPTPEI,SFAVRHIJ!IAYAGIMVTASHNPAPENeYKVYGQDGGQLPPA 
Sbjct: 107 lAQHGIKSYVFEAIJlPTPELSFAVRHIJJAYAGIMVTASHNPAPETSrGYKVYGQDGGQLPPA 166 

Query: 61 DADALTDFIRAIENPEAVELADLDESKSSGLIQVIGEDVDIEYLREVKDVNINQDLINNF 120 
DADALTDFIRAIENPFAVELADLDE+KSSGLIQVIGEDVD+EYLREVKDVNINQDLINNF 

Sbjct: 167 DADALTDFIRaiENPFAWLADLDENKSSGLIQVIGEDVDMEYLREVKDVNINQDLIWNF 226 

Query: 121 GKDMKIVYTPIaHGTGEMLTRRALaQaGFESWWESQAKADPDFSTVKSPNPESQAAFAL 180 

GKDMKIVYTPIiHGTGE^QiTREALaQAGFESVVVVESQAKftDPDFSTVKSPNPKSQa^ 
Sbjct: 227 GKDMOVYTP]:flGTGE^ttTRRAIAQAGFESVVVVESQAKaDPDFSTVKSE^IPESQ&AFAL 286 

Query: 181 AEELGREVDADVLVATDPDADRLGVEIRQPDGSYKNLSCSNQIGAIIAKYILEAHKTAGTL 240 ■ 

AEBIfiREV+ADVLVATDPnADRLGVEIRQPIWSYKNLSGNQIGAIIAKYILEAHKrAGTT. 
Sbjct: 287 JffiELGREVEADWLVATDPI)ADRLGVEIRQPIXKYK^^^SGNQIGAIIAKyILEaHKTAGT^ 346 

Query: 241 PENaAIJUCSIVSTELVTKIAESYGRTMF^rVLTGFKFIAEKIQEFEEKHNHTYMF6FEESF 300 

PENAALAKSIVSTELVTKIAESYGATMENVLTGFKPIAEKIQEFEEKHNHTVMFGFEESF 
Sbjct: 347 PENAAIAKSIVSTELVTKIAESYQATMENVLTQFKFIAEKIQEFEEKHNHTYMFGFEESF 406 
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Query: 301 GYLIKPFWDKDAIQAVLLVAEIAAYYRSRGLTLADGIDEIYKEYGYFAEKTISVTLSGV 360 

GYLIKPFWDKDAIQAVLLVAEIAAYYRSRGLTLADGIDEIYKEYGYFAEKTISVTLSGV 
Sbjct: 407 GYLIKPFVRDKDAIQAVLLVAEIAAYYRSRGLTLADGIDEIYKEYGYFAEKTISVTLSGV 466 

Query: 361 DGAAEIKKirTOKFREMSPKQEmiTDIVLLEDFQKQTATKlSnXSTISMLTTPPSN^ 420 

DC3AaEIKKI^roKFRENGPKQFNlOT)^VLLEDFQKQTATKNTOTISHLTTPPS^^ 
Sbjct: 467 DGAAEIKKIMDKPRENGPKQFNimJIVLLEDFQKQTATKNDGTISNLTTPPSK^ 526 

Query: 421 DDSWIAVRPSGTEPKIKFYIATVGIJDLADaETKIANIEKEITTFV 465 

DDSWIAVEPSGTEPKIKFYIAT+G+ Xi A+ KIANIE El TFV 
Sbjct: 527 DDSWIAWPSGTEPKIKFYIATIGDTLDIAQEKIANIETEINTFV 571 

Based on this analysis, it was predicted that this protein and its epitopes, could be use&l antigens for 



Example 2119 

A DNA sequence (GBSx2235) was identified in S.agalactiae <SEQ ID 6541> which encodes the amino 
acid sequence <SEQ ID 6542>. Analysis of this protein sequence reveals tbe following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1564 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9905> which encodes amino acid sequence <SEQ ID 9906> 
was also identified. There is also homology to SEQ ID 32. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2120 

A DNA sequence (GBSx2236) was identified in S.agalactiae <SBQ ID 6543> which encodes the amino 
acid sequence <SEQ ID 6544>. This protein is predicted to be ABC transporter, ATP-binding protein 
(msbA). Analysis of this protein sequence reveals the following: 
Possible site: 48 

>» Seems to have an uncleavable N-term signal seq 

IMTEGRAL Likelihood = -9.92 Transmembrane 162 - 178 < 135 - 
INTEGRAL Likelihood = -7.11 Transmembrane 58 - 74 ( 56 - 78) 
INTEGRAL Likelihood = -6.42 Transmembrane 136 - 152 ( 135 - 
INTEGRAL Likelihood = -5.20 Transmembrane 23 - 39 ( 21 - 49] 
INTEGRAL Likelihood = -1.75 Transmembrane 485 - 501 ( 485 - 



Final Results 

bacterial membrane Certainty=0. 4970 (Affirmative) < succ: 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35376 GB:AE001710 ABC transporter, ATP-binding protein 
[Thermotoga maritima] 
Identities = 216/552 (39%) , Positives = 336/552 (60%) , Gaps = 3/552 (0%) 



Query: 26 MALLGTVVQVCLTVYLPVLIGQAVDVVLSPHSMILLLPIMWKMlAVILaNTIIQWINPLL 85 
M + V L V P LIG+ +DVV P LL M + + +++ W+ + 
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Sbjct: 41 MVFViAm/SSILGVLSPYLIGKTIDWWPRRFDLLPRYMLILGTIYALTSLLFWLQGKI 100 

Query: 86 YmLIFHWASLRKAVMEKIiNLLPIAYIiDKRGIGDLISRVm 145 

L V LRK + EKL +P+ + D+ GD+ISRV D + ++N L QFF 
Sbjct: 101 MLTLSQDWPRLRKELFEKLQRVPTOFFDRTPHGDIISRVINDVDNnraVLGNS 160 

Query: 146 GLLTIIVTIFSM?VKIDLIiMLFLVLFLTPLSLF]:j«FIAKKSY-HLYCM2TASRGRQTQFI 204 

G+-1-T+ + M ++++++ + L + PL++ + + ++++ + Y+NQ G+ I 
Sbjct: 161 GIVTIJWaVimFRVNVILSLVTLSIWLTVLITQIVSSQTRKYFYENQRVL-GQLMQII 219 

Query: 205 EEMVSQESLIQaFSaQEESSDHFRTINQEYftNFSQSAIFYSSTVNPSTRFINSLIYGFLA 264 

EE +S F+ +E+ + P +N+ A +S + P +N+I1 + ++ 

Sbjct: 220 EEDISGLTVIKLFTRBEKE^EKFDRVlraSLRKVGTKRQIFSGVIlPPIMJ^^Vl^^ 279 

Query: 265 GIGRLRIMSGRFSVGQLITPLNYVNQYTKPFtTOISSVLSEMQSRLRCaERLYSILEESSP 324 

G G + +VG + TF+ Y Q+T+P N++S+ + +Q MA AER++ IL+ 
Sbjct: 280 GFGGWUUiKDIITVGTIATFIGYSRQFTRPIiNELSNQFimiQfmiASaERIFEIIiDLEE 339 

Query: 325 NITGTEKIiDSSTVKBQIDFKNWFGYNKSKLLmGIKLHIPAGAKTRIVGPTaaGKSTM 384 

+ ++ V+G+I+FKW F Y+K K +L I HI G KWA+VGPTG+GK+T++ 
Sbjct: 340 K-DDPDAVELREVHGEIEFKlWWFSYDKiaa>VLKDITFHIia>GQKWALVGPTGSGCT^ 398 

Query: 385 KLImFYEVDGGNIL]iDCKPITDYEPSQLRQEIG^IVLQETWIlKSaTIHDNIAYMJPKRSR 444 

KL+MRFY+VD G IL+D I + S LR IG+VLQ+T L S T+ +N+ Y NP A+ 
Sbjct: 399 OT,MJFYDVDRGQILVDGIDIRKIKRSSLRSSIGIVLQOTILPSTTVKEra.KyGNPGATD 458 

Query: 445 EEVISMKAANADFFIKQLP^IGyDTyLEDAGDSLSQGQCQLLTIARIFLKLPRILILDEA 504 

EE+ EAAK ++D FIK L? GY+T L D G+ LSQGQ QLL I R PL P+ILILDEA 
Sbjct: 459 EEIKEAAKLTHSDHFIIOILPEGYETCLTDNGEDLEQGQRQLriMTRAFLANPKILILDEA 518 

Query: 505 TSSIDTRTEVLVQEAFQMIMKGRTSFIIAHRLSTIQTADIILVMVSGEIVEVGmSELMA 564 

TS++DT+TE +Q A LM+G+TS IIAHRL+TI+ AD+I+V+ GEIVE+G H EL+ 
Sbjct: 519 TSNVD^KTEKSIQaA^lWKIl^ffiGKTSIIIAHRLN^IKNADLIIVLRDGEIVEM 578 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6545> which encodes the amino acid 
40 sequence <SEQ ID 6546>. Analysis of this protein sequence reveals the following: 



Possible £ 

= Seems to have an uncleavable N-term signal seq 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



TransmetTtorane 152 - 178 ( 159 - 

Transmembrane 143 - 159 ( 137 - 

84 Transmeinbrane 23 - 39 { 19 - 

68 Transmembrane 68 - 84 ( 60 - 

55 Transmembrane 261 - 277 ( 256 - 



Final Results 

bacterial membrane Certaintyi=0. 4227 (Affirmative) < auco 

bacterial outside Certainty^O, 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD35376 GB:AE001710 ABC transporter, ATP-bindixig protein 
[Thermotoga maritima] 
Identities = 206/572 (36%) , Positives = 342/572 (59%) , Gaps = 5/572 (0%) 

Query: 2 IKTDIfflLI,KRVLQDLLKKPLFVCILVIASFVQVG--LSVYLPVLIGKAVDMSLSVNSWQT 59 

+K L+R+L L +P ++++ FV V L V P LIGK +D+ + 

Sbjct: 18 LKOT>TATIJaaJ^YL--RPHTFTLI^WFVFVTVSSILGVLSPYLIGR^IDVVFVPRRFDL 75 

Query: 60 LKOTiLGQMLVIIVVOTLIQWVMPLVYSRLLYQYSQQIiKDiaiLEKIHRLPFAYIiDRQTIGD 119 

L + + I + +L+ M+ + L +L+ +L EK+ R+P + DR GD 

Sbjct: 76 LPRYMLILGTIYALTSLLFWLQGKIMLTLSQI3WFRLRKELFEKLQRVPVGFFDRTPHGD 135 
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Query: 120 LVSRVITDTEQLINGLQWmQFIMLLTILCTIIAMaQIDWLMIiILVLVlTPSSLFrAR 179 

++SRVI D + + N L QF G++T+ +1 M +++++++ L + P +H- + + 

Sbjct: 136 IISRVIiroVDNIimVLGNSIIQFFSGIVTIiAGBVIMMFIlVNVILSLOTIiSIVPLTVLITQ 195 

Query: 180 FIAQKSFHYAQRQTKSRGNLAQFTEEILRQEGLVQLENaQEQSIOJymTIJilK^ 239 

++ ++ y + G L EE + +++LF +E+ + + +N++ + K 

Sbjct: 196 IVSSQTRKyFYENQRVLGQUaGIIEEDISGLWIKLFTRKEKEMEKFnRVIffiSIJliroGTK 255 

Query: 240 AIFYAST\OTATRFINSViyAr£AGI<aTOimGLFSVGQLTTFiaiVVVQYTKPFNDISS 299 

A ++ + P +AL++G 6 + + +VG + TF+ Q+T+P N4-+S4- 

Sbjct: 256 AQIFSGVLPPLMNMVmLGFALISGFGGWLALKDIITVGTIATFIGYSRQFTRPLNELSN 315 

Query: 300 VLAEIQSSLACAQRLYDLLDIEIKEQEHFLTFKASAVKGQIDFEEVSFSYQKDRPLLKDI 359 

IQ +LA A+R+++4-LD+E +E++ + V+G+I+F+ V FSY K +P+LKDI 

Sbjct: 316 QFNMIQMALASAERIFEILDLE-EEKDDPDAVELREVRGEIEFKNVWPSYDKKKPVLKDI 374 

Query: 360 NFSVPAGSKmiVGPTGAGKSTLXKLITOFYELnAGSIKLDKOTIKCYAK^ 419 

F + G KVA+V<3PTG+GK+T++NLUyiRPy4-+D G I +D + 1+ + LRS 6IV 
Sbjct: 375 TFHIKPGQKMUliTOPTGSGKTTIVNLLrffiFraVDRGQILVDGIDIRKIKRSSBRSSIGIV 434 

Query: 420 LQETWLKDATVHELIAY6SEEASRDEVVaaaK2iAHAHFFIMQLPraTOTyLSASDria^ 479 

LQ+T L TV E + YG+ A+ +E+ AAK H+ FI LP+ Y+T L+ + + LSQ 
Sbjct: 435 LQDTILFSTTVKENLKYGNPGRTDEEIKEAAKLTHSDHFIKHLPEGYETVLTIMSEDLSQ 494 

Query: 480 GQLQLLAIARMFLKKPKVLVLDEA^TSSIDIRTEAVIQESLKELMRGRTSFIIAHRLSTIQ 539 

GQ QLLAI R FL PK+L+I1DERTS++D +TE IQ A+ +LM G+TS IIAHRL+TI+ 
Sbjct: 495 GQRQLIAITRAFLaNPKIIiIIDEATSimjTKTEKSIQAftMWKLMEGKTSIII^^ 554 

Query: 540 SADLILVMDQGRIiVEWGTHRSLMSKNGCYVRL 571 

+ADLI+V+ G+VEGHL+KGYIi 
Sbjct: 555 NADLIIVLRDGEIVEMGKHDELIQKRGFYYEL 586 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 340/556 (60%) , Positives = 433/566 (76%) 

Query: 11 KKLVQDLLSKKSLVGMALLGTWQVCLTVYLPVLIGQAVDVVLSPHSMILLLPIMWKMIA 70 

K+++QDLL K V + ++ + VQV L+VYLPVLIG+AVD+ LS +S L ++ +M+ 
Sbjct: 10 KRVLQDLLKKPLPVCILVIASFVQVGIiSVYIiPVLIGKAVDMSLSVNSWQTLKWLLGQMLV 69 



Query: 71 VILANTIIQWINPLLYNRIiIFHYVASLRKAWlEiaLNLLFIAYLDKRGIGDLISRVTTDTE 130 

+1+ nT+IQW+ PL+Y+RL++ Y L+ ++EK++ LF AYL,D+-i- IGDL+SRV TDTE 
Sbjct: 70 IIWNTLIQMVMPLVYSRLLYQYSQQLKDICLLEKIHRLPFAYLDRQTIGDLVSRVITDTE 129 

Query: 131 QLSNGLIMVENQFFVGLLTIIVTIFSMAKIDLLMLFLVLFLTPLSLFLRRFIAKKSYHL^ 190 

QL NGL MVFNQF +GLLTI+ TI +Ma+ID LML LVL LTP SLPLARFIA+KS+H 
Sbjct: 130 QLlNGXOlVFNQFILGLLTILCTIIBMaQIDVMO^ILVLVLTPSSLFLRRFIAQKSPHYA 189 

Query: 191 QNQTASRGRQTQFIEEMVSQESLIQAFSSQEESSDHFRTINQEYANFSQSAIFYSSTWP 250 

Q QT SRG QF EE++ QE L+Q F+AQE+S + +N+ Y SQ AIFY+STVNP 
Sbjct: 190 QAQTKSRGNIAQFTEEILRQEGLVQLFIRQEQSICDYHVIiNKTYCEASQKAIFYASTVNP 249 

Query: 251 STRFINSLIYGFLAGIGALRIMSGAFSVGQLITFLNYVNQYTKPFNDISSVLSEMQSALA 310 

+TRFINS+iY LAG+GA+RIM+G FSVGQL TFLN V QYTKPFNDISSVL+E+QS+LA 
Sbjct; 250 ATRFINSVIYALLAGLGAVRIMAGLFSVGQLTTFIiNVVVQYTKPFNDISSVLaEIQSSIjA 309 

Query: 311 CAERLYSILEESSPNITGTEKLDSSTVKGQIDFKNWFGYKKSKLLLNGINLHIPAGAKV 370 

CA+RLY +S VKGQIDF+ V F Y K + LL IN +PAG-I-KV 

Sbjct: 310 CAQRLYDLLDIEIKEQEHFUiTFKASAVKGQIDFEEVSFSYQKDRPLLKDINFSVPAGSKV 369 

Query: 371 AIVGPTGMKSTLIHLIMRFYEVDGGNILLDCKPITDYEPSQLRQEIGMVLQETWLKSAT 430 

AIVGPTGAGKSTIiIML+MRFYE+D G+I LD PI Y +LR G+VLQETWLK AT 
Sbjct: 370 AIVGPTGAGKSTLlNLLMlFYEIIWSSIKIiDKVPIKCYaKEELRSITGIVLQETWXjKDAT 429 

Query: 431 IHDNIAYANPKASREEVIERAKASNADFFIKQLPNGYDTYLEDlRGDSLSQGQCQLLTIAR 490 

+H4 lAY + +ASR+EV+ AAKAA+A FFI QLP YDTYL + D+LSQGQ QLL lAR 
Sbjct: 430 VHELIAYGSEEASRDEWAAAKAAHaHFFIMQLPKTYDTYLSASDDALSQGQLQLLAIAR 489 
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Query: 491 IFLKLPRILILDEATSSIDTRTEVLYQEAFQMLMKGRTSFIIAHRLSTIQTADIILVMVS 550 

4-FLK P++L+LDEATSSID RTE ++QEA + LM+GRTSFIIAHRLSTIQ+M+ILVM 
Sbjct: 490 MFLKKPKVLVLDEATSSIDlRTBAVIQEALKELMRGRTSFIIAHRLSTIQSftDLILVMDQ 549 

Query: 551 GEIVEVGNHSELMAQKGIYYQMQNBQ 576 

G +VE G H+ LM++ G Y ++Q + 
Sbjct: 550 GRLVEWGTHASLMSKNGCYVRLQKIE 575 



Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or d 



Example 2121 

A DNA sequence (GBSx2237) was identified in S.agalactiae <SEQ ID 6547> which encodes the amino 
acid sequence <SEQ ID 6548>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



■ Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Gertainty=0. 1099 (Affirmative) < suco 
■ Certainty=0. 0000 (Not Clear) < suco 
- Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 

vaccines or diagnostics. 



Example 2122 

A DNA sequence (GBSx2238) was identified in S.agalactiae <SEQ ID 6549> which encodes the amino 
acid sequence <SEQ ID 6550>. This protein is predicted to be ABC transporter, ATP-binding protein 
30 (msbA). Analysis of this protein sequence reveals the following: 



Possible site: 37 
»> Seems to have 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 



N- terminal 
Likelihood =-13 
Likelihood =-10 
Likelihood = -7 
Likelihood = -6 
Likelihood - -4 
Likelihood = -1 



iicpial sequence 
Transmembrane 
Transmembrane 




- Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 6477 (Affirmative) • 
-- CertaintytaO. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < i 



45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35375 GB:AE001710 ABC transporter, ATP-binding protein 
[Thermotoga mairitima] 
Identities = 196/570 (34%) , Positives = 327/570 (56%) , Gaps = 5/570 (0%) 



Query: 61 FLAA-VGWVAITAQYYSSKAAVGYTRQLTEDLYQKVMSLGKKDRDELGTASLITRLTAD 119 
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+ A +G V I ++S A+ + L DL++KV+S + + T+SLITRLT D 
Sbjct: 60 LlVALICSAVCMIGCWFASYASQNEGftDLRRDLFRKVLSFSISNVNRFHTSSLITRLTKD 119 

Query: 120 TFQIQTGLNQPLRLFLRAPIIVFGAIIMAFSISPSLTIWFLVMVVTLFIIVFVMSRLIiNP 179 

Q+Q + LR+ +RAP++' G I+MR SI+ IH- ++++++ +++ NP 
Sbjct: 120 OTQLQm^VMMIiRIVVEAPLLFVGGIVMaVSIOTKrjSSVLIFLIPPIV]:.LFVl^ 179 

Query: 180 IYIJKIRTSTDYLVKLTRQQLCX3VRVIRAFNQVDRESEIAF^a)Il^^^YTNLQLKRGRLSSLV 239 

++ KI+ STD + ++ R+ L GVRV+RfiT + + E+E F H + A L 

Elbjct: 180 LPRKIQESTDEVNRVVRiaiLLGVRVVRaFRREEYENENFRKaNESLRRSIISAFSLIVFA 239 

Query: 240 TPLTFLVWITLWIITOGiniNIANHLLSQGMLVALINYULQILWLLKiyrim 299 

PL +VN+ ++ ++W G + + N+ + G ++A NYL+QI+ L+ + ++ + ++ 
Sbjct: 240 LPLFIFIVmGMIAVLWPGGVLVRNNQMEIGSIMAYTNYIMQIMFSI^IGNIMIFIVRA 299 

Query: 300 yiSAKRIIAVF-ERPS-EIIDDKLEPKYSNKALEVQEMAFSyPNSSEKALSDITFSMNVG 357 

SAKR++ V E+P+ E D+ L ++ + + F Y +++ LS + FS+ G 

Sbjct: 300 SASAKRVLEVU^KPAIEEJUJNAIALENVEGSVSFENVEFRYFENTDPVLSGVNFSVKPG 359 

Query: 358 ETLGIIGGTGSGKSTLIHLLLHiyKVQEGDIDIYHQGKSPDTISMWRTLVRWPQNAQLF 417 

+ ++G TGSGKSTL+HL+ + + G +++ + + R + VPQ LF 

Sbjct: 360 SLVAVIfiETGSGKSTI.MNLIPRLIDPERGRVEVDEU3VRTVKLKDLRGHISAVPQETVLF 419 

Query: 418 KGTIRSNLSLGMKVSEEKLOTALEIAQASDFVKEKDGQLDAPVESFGRNFSGGQRQRLT 477 

GTI+ NL G +++++ A +IAQ DF+ D+ VE GRNFSGGQ+QRL+ 

Sbjct: 420 SGTIKENLKWGREDATDDEIVEAAKIAQIHDPIISLPEGYDSRVERGGRNFSGGQKQRLS 479 

Query: 478 IRRALVQDKIPFLILDDATSALDYLTEARLFKAITKHFNQTNLIIVSQRINSIQISaDRIL 537 

IARALV+ K LILDD TS++D +TE R+ + ++ I++Q+I + AD+IL 

Sbjct: 480 lARALVK-KPKVLILDDCTSSVDPITEKRILDGLKEYTKGCTTFIITQKIPTALLADKIL 538 

Query: 538 LLDKGKQVGFDNHQSLLAHNKVYKSIYHSQ 567 

+L +GK GF H+ LL H K y+ lY SQ 
Sbjct: 539 VLHEGKVAGFGTHKELLEHCKPYREIYESQ 568 

A related DNA sequence was identified in S.pyogenes <SEQ ID 655 1> which encodes the amino acid 
sequence <SEQ ID 6552>. Analysis of this protein sequence reveals the following: 
Possible site: 37 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood =-12.47 


Transmembrane 


157 


173 


149 


185 


INTEGRAL 


Likelihood = -7.75 


Transmembrane 


55 


71 


51 


74 


INTEGRAL 


Likelihood = -4.25 


Transmembirane 


239 


255 


237 


260 


INTEGRAL 


Likelihood = -3.77 


Transmembrane 


20 


36 


19 


37 


INTEGRAL 


Likelihood = -3.50 


Transmembrane 


271 


287 


270 


288 


INTEGRAL 


Likelihood = -2.55 


Transmembrane 


133 


149 


130 


151 



Final Results 

bacterial membrane Certainty=0 .5989 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the foUowmg sequences in the databases: 

!GB:AL137187 putative ABC transporter [Streptomyoes ... 296 6e-79 

>GP:CAB69751 GB:AL137187 putative ABC transporter [streptomyoes 

coelicolor A3 (2)] 

Identities = 185/559 (32%) , Positives = 306/569 (53%) , Gaps = 8/569 (1%) 

Query: 1 MKRLRPYVKGYLKESILGPLFKLLEflLFELLVPLDIANNIDISISQHNSQGILRVVLTLF 60 

++ LR Y++ Y K L + L+ L +P L A++ID + + +S IL + 
Sbjct: 3 IRIiRTYIJlPYKKPIALLVALQFLQTCASLYLPTLNAHIlDEGVVKBDSGYILSYGALMI 62 

Query: 61 GLATIGLLLSVTAQYFSSKAAVGFTRQMTDDLFKKIMPLSKEDQDHUSYASLLSRLTSDS 120 

G++ ++ ++ A ++ ++ A R + +F ++ S + H G SL++R T+D 
Sbjct: 63 GISLAQWCNIGAVFYGARTAAALGRDVRGAVFDRVQSFSAEEVGHPG&PSLITRTTNDV 122 
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Query: 


121 


Sbjct: 


123 


Query: 


181 


Sbjct: 


183 


Query: 


241 


Sbj Ct : 


243 


Query: 


301 


Sbj ct : 


303 


Query: 


350 


Sbjct: 


363 


Query: 


417 


Sbjct: 


420 






Sbjct: 


479 


Query: 


537 


Sbjct: 


539 



f API+ G +VMA + Ii+ + +V VIi 



D + R+ +Q-I- C3 RVI+AF + + E Q F++ N 11+ 



+IG TG+GK+TL+ L+ + 



IiF GT+ +NL G + +DEELW AD +AQAKEFV+ L LH-AP+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 313/5S8 (55%) , Positives = 428/568 (75%) , Gaps = 9/568 (1%) 

Query: 1 MKRLTYYFKGYIICETIFGPLFKLLEASFELLVPIVIAKMIDETIPRGDRSGLLLQIGLIF 60 
MKHL y KGY+KE+I GPLFKl.LFA FELLVP++IA MID +1 + + G+L + +F 

Sbjct: 1 mkrlrpyvkgylkesilgplfkllealfejLvpllianmidisisqhnsqgilrwltlp so 

Query: 61 FLAAVGVWAlTAQyYSSKAAVGYTRQLTEDIjYQKVMSLGKKDRDELGTASLITRLTADT 120 

LA +G+++++TAQY+SSKftAVG+TRQ+T+DL++K+M L K+D+D LG ASL++RLT+D+ 
Sbjct: 61 GliATIGIJiSVTAQYFSSKaRVGFTRQMTDDLFKKIMFLSKEDQDHLGYASLLSRLTSDS 120 

Query: 121 FQIQTGLNQFLRLFLRAPIIVFGAIIMAFSISPSLTIWFIiVMWTLFIlVFTOSRIiIOT 180 

FQIQTG+NQFLELFLRAPIIV QA++MA+ ISPSLT+WF++MV+ L +VFVMS li P+ 
Sbjct: 121 FQIQTSINQFLRLFLRAPIIVCXSaMVM&YWISPSLTLWFVIMVIVIiTLVFVMSHIJ^ 180 

Query: 181 YLKIRTSTDYLVKLTRQQtaSVRVIRAFNQVDIJESEAFtroiNYHYT^ 240 

YL IR TD+LV+LT QQLQG+RVI+AFNQ +E +AF N + Q +A L++++ 
Sbjct: 181 YLLIRRETDHI.WLTSQQIa3IRVIKm^QTQKELQAFKQQ^IMI>I,SRHQYQARTIJaWLN 240 

Query: 241 PLTFLVVNITLWIIWRGNLNIANHLLSQGMLVALINYLLQILVELLKMTMLVTSLNQSY 300 

P+TFLVVN+TL+++IW+G+ +A+ LSQGMLVALINYLLQIL ELLKMTML+ ++NQS 
Sbjct: 241 PMTFLWNLTLLILIWQGSWQVAHRSLSQGMLVALINYLLQILAELLKMTMLMGTINQSV 300 

Query: 301 ISAKRIIAVF ERPSEIIDDKLEPKYSNKALEVQEMAFSYPNSSEKALSDITFSMNV 356 

. +AKRI VF E P ++ D S L ++ + F+YP ++E +L DI S + 

Sbjct: 301 TASKRINQVFVIJU)EAPrjLLKD---GPISTHLLTIRHLTFTYPGAflEPSLYr)IQr,aBDQ 357 

Query: 357 GETU3IIGGTGSGKSTLimLIiHIYKVQEGDIDIYHQGKSPDTISNmTLVRVVPai!i[a.QL 416 

GE +GIIGGTG+GK+TLI+L+ Y G+I + QG+ P T++ WR ++ +VPQ AQL 
Sbjct: 358 GEWIGIIGGTGAGKTTLIDLICQTYSQYSGEISUMQGEVPKTLTEWKNVIALVPQKAQL 417 

Query: 417 FKGTIRSISTLSLGLG-KVSEEKLM'ALEIAQASDFVKEKDGQr.DAPVESFGRNFSGGQRQR 475 

FKGTIRSNL LG +S+E+LW ALE+AQA +FV QL+APVE+FGR+FSGGQRQR 
Sbjct: 418 FKGTIRSrailiLGQSMPlSDEBLWEALELaQAKEFVftAtiPEQtiEAPVEAFGRHFSGGQRQR 477 
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Query:- 475 I,TIia?MjVQDKIPFLILDDATSAII)yLTEMlIiFKAITKHFNQmLIIVSQRINSIQNADR 535 

L I2iRAL++ K P LILDDA+SALD T RLFKA+ + + +1+V+Q I ++Q AD+ 
Sbjct: 478 IiRIfiRALLKPK-PILIII3I»SSALDNErCRGRLFKALKEEaJSDAniVlLWQSIKNI^ 536 

Query: 535 ILLLDKGKQVGFDNHQSLLAEDIKVYKSl 563 

IL+L++G Q+ F +H L W .+Y+ + 
Sbjct: 537 ILVLEQGHQLDPASHDQLKVSNALYQEM 5 54 

Based on this analysis, it was predicted that fliis protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 

Example 2123 

A DNA sequence (GBSx2239) was identified in S.agalactiae <SEQ ID 6553> which encodes the amino 
acid sequence <SEQ ID 6554>. Analysis of this protein sequence reveals the following: 
Possible site: 43 

»> Seems to have an uncleavable N-term signal seq 

IHTEaRAL Likelihood =-12.26 Transmembrane 8 - .24 ( 1 - 28) 

Final Results 

bacterial menibrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . COOO (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . COOO (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB84433 GB:AF027868 RRS-related protein [Bacillus subtilis] 
Identities = 53/140 (37%) , Positives = 78/140 (54%) , Gaps = 2/140 (1%) 

Query: 28 VKKVLQYHDLVQmLRENGSEftNVHLVLSMIYTETKSnAIDVMQSSESISGTTNSITDSH 87 

++++ Y LV+ L G L+L M+Y E+K3 D MQSSES+ N ITD 

Sbjct: 49 LERLTDYKPLVEEELESQGLSNYTSLILGMMYQESKGKGNDPMQSSESLGLKRHEITDPQ 108 

Query: 88 TS1KHGWLLSQNISQAKKAKVDVWTAVQAYNFGSSYIDY\'ADHGGENSIELAKNYSKNV 147 

S+K G+ + K- VD+ T +Q+YW G+ YID+VA+HGG ++ EIAK YS+ 

Sbjct: 109 LSVKQGIKQFTLMYKTGKEKGVDLDTIIQSYHMGAGYIDFVAEHGGTHTEELAKQYSEQQ 168 

Query: 148 m- -PSLGNYHGDTYFYYHP 165 

V PL G+ + +P 
Sbjct: 169 VKKNPDLYTCGGNAKNFRYP 188 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4143> which encodes the amino acid 
sequence <SEQ ED 4144>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.6S Transmembrane 8 - 24 ( 7 - 25) 

Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 134/200 (67%) , Positives = 165/200 (82%) , Gaps = 1/200 (0%) 

Query: 1 MFKFLKRLIALIIlIFIGYRLVIIHENVKKVLQYHDLVGNTL2iENGSEANVHLVLSMIYT 60 

MF+ LKR + +++ F+ Y+ +IH HV++VL Y +V+ TLftEN ++JiNV LVL+MIYT 
Sbjct: 1 MFEIiKEACSFLLL-FVIYQSFVIHHNVQRVIAYKPMVEKTLaENDTKAlSrUDLVIM 59 



Query: 61 ETKGDAIDVMQSSESISGTTNSITDSHTSIKHGVTLLSQNISQAKKFVKVDVWTAVQAYNF 120 
ETKG DVMQSSES SG NSITDS SI+HGV LLS H++ A++A VD WTAVQAYNF 
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Sbjct: 60 ETKGGEAD^'MQSSESSSGCKNSITDSQASIEHGVNLLSHNLALAEEAGVDSWTAVQAYNF 119 

Query: 121 GSSYIDTCADHGG3NSIELAm"SKWi/APSIjGlSraGDTYPYYHPIALISGGKL 180 
G++YIDY+A+HGG+N+++LA YSK WAPSLGN +G TYFYYHPLaLISGGKLYKNGGN 
, 5 Sbjct: 120 GTAYIDYIAEHGGQNTVDLATTYSKTVVAPSLGNTSGQTYFYYHPIMiISGGKLYKISE^ 179 

Query: 181 lYYSREVQEmYLIKIMELF 200 

lYYSREV FNLYI1I++M LF 
Sbjct: 180 lYYSREVHPNLYLIELMSLF 199 

10 

SEQ ID 6554 (GBS244) was expressed in E.coU as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 59 (lane 4; MW 23.1kDa). It was also ejipressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 67 Qems 2; MW 48kDa). 

GBS244-GST was purified as shown in Figure 211, lane 5. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2124 

A DNA sequence (GBSx2240) was identified in S.agalactiae <SEQ ID 6555> which encodes the amino 
acid sequence <SEQ ID 6556>. Analysis of this protein sequence reveals the following: 

20 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2401 (Affirmative) < suco 

25 bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . OOOO (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9837> which encodes amino acid sequence <SEQ ID 9838> 
was also identified. 

30 The protein has homology •mHa. the following sequences in the GENPEPT database. 

>GP:CftB71302 GB:AJ130879 hypothetical protein [Clostridium 
sticklandii] 

Identities = 32/95 (33%) , Positives = 53/95 (55%) , Gaps = 1/95 (1%) 

35 Query: 235 LSPEKLADQLFDDNLTARLTFVDELKDAIPGPVQVSDIDHSRQIKKLENQKLSLSNGIEL 294 

LS EK + F++ + + + L A Q+ ++ + +K E QK+ +GIE+ 

Sbjct: 2 LSVEKALETAFEETDEIKAIYKEALSKAGIENEQI-EVSETALKRKFEIQKIITESGIEV 60 

Query: 295 IVPNNVYQDAESVEFIQNPDGTYSILIKNIQDIQN 329 
40 +P N Y D +EF+ N DGT S++IKNI +IQ+ 

Sbjct: 61 KIPVNYYGDPSKLEFVANGDGTVSLVIKNIGNIQS 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6557> which encodes the amino acid 
sequence <SEQ ID 6558>. Analysis of this protein sequence reveals the following: 

45 possible site: 52 

»> Seetns to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 3336 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 246/325 (75%) , Positives = 286/325 (87%) 

Query: 6 MMDFYlKQIIIHQFSPNDTELVLSDTPLTLTPRIDDYFRKKLSKVFSDEAKRGyFGEDNV 65 

M+D YIK+I+IHQFSPNDTEL+LSD +++TPRID+YFRKia)+KVFSDEAKRG F +N 
Sbjct: 1 MUDSYIKRIVIHQFSEfflDTEIjIjLSDRLVSITPRIDEYFIUCKIjAKVFSDmKRGQFEAl^ SO , 

Query; 66 F^eHLQDDLYVSSCXJI^QLWKBE;FVISEDQKTNDIIVFIQFDKDG^lBHFAFLRISLKEQFA 125 

F + + DDL +S lAQLWKB FVISEDQKTNDLVF+QFDKDG FAFIiRI+LKEQFA 
Sbjct: 61 FFirTI6DDLLETSVTIAQLWKEAFVISEDQKnilDLVFVQFDKDGEPFFAFLRIAI.KEQFA 120 

Query: 126 HVSENQBQPITITQNimPJSAAQTEDEALVVNKSSKQYYLIEKRIK^ 185 

H+S+N E P T+TC2NNLPS QTPDEALV+N S QYYI1IEKR+KHNGSFANYFSE+LL+ 
Sbjct: 121 HLSDim!HPFTVTQma.PSPTQTEDEALVIllLKSGQYYLIEiaiVKHNeSEAOT 180 

Query: 186 VQPEQSVKKSIKMVEQTAQKIAEniJENKDDFSFQSKMKSAIYKMLEEEQEriSPEKLADQL 245 

V PEQSVKKSIKM+EQTAQKIAE+EN+DDF+FQSKMKS ++K LE + LSPEKLSDQLP 
Sbjct: 181 VTPEQSVKKSIKMIBQTAQKIAHHENQDDFTFQSKMKSTLFKQLEADDVrjSPEKLADQLP 240 

Query: 246 DDNLTARLTFVDELKDAIPGPVQVSDIDHSRQIKKIiENQKLSLSNGIELIVENll^ 305 

DnNLTARLTFVD++KD IP P+++SDI+HSRQIKKLENQKLSLSNGIEL VPN +YQDAE 
Sbjct: 241 DDNLTAELTFVDQVKDVIPEPIKISDIEHSRQIKKLEISIQKLSIiSNGIELTVENAIYQDAE 300 

Query: 306 SVEFIQNPDGTYSILIKNIQDIQNK 330 

+VEF+ N DGTYSII1IKNI+DI+ K 
Sbjct: 301 AVEFLLNDDGTYSILIKNIEDIKIK 325 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2125 

A DNA sequence (GBSx2241) was identified in S.agalactiae <SEQ ID 6559> which encodes the amino 
acid sequence <SEQ ID 6560>. This protein is predicted to be Serine hydroxymethyltransferase (glyA-1). 
Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty-0 .3876 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

rine hydroxymethyltransferase [lliermotoga maritima] 
Positives = 307/416 (73%) , Gaps = 7/416 (1%) 

Query: 9 KEFDQELWQAIHDEEIRQQNNIELIASENWSKAVMaAQQSVLTNKSAEGYPSHRYYGGT 68 

K+ D E+++ + +E RQ+ +ELIASEN S AV+ GS+LTNKXREGYP RYYGG 
Sbjct: 6 K(3VDPEIYEW^VNELKRQEYGLELIASENFASIAVIETM6SML.TNKyAEGyPK^ 65 

Query: 69 DCVDVVESLAIERAKTLFNaEFAMVQPHSGSQaNAflAYMALIBPGDTVI.GMDIiAAGGHLT 128 

+ VD E AIERAK LP A+FAHVQPHSGSQftN A Y+AL +PGDT+4GM L+ GGHLT 
Sbjct: 66 EWVDRAEERAIERAK3lLFGAKFAMVQPHSGSQaNMAVyLaLAQPGDTIMGMSLSHGGHLT 125 

Query: 129 HQASVSPSGKTYHFVSYSVDPKTEMLDYDNILKIAQETQPKLIVAGAaAYSRIIDFEKFH 188 

HGA V+FSGK + V Y V+ +TE +DYD + ++A E +PK+IVAG SAY+RIIDF++FR 
Sbjct: 126 HG&PVNFSGKIFKWPYGVNLETETIDYDEVRRLALEHKPKIIVAGGSAYARIIDFKRFR 185 

Query: 189 QIADAVDAYLMVDMAHIAGLTaSGHHPSPIPYAHVTTTTTHKTLRGPRGGLILTNDEAIA 248 

+IAD V AYIiMVDMAH AGLVA+G HP+P+ YAHV T+TTHKTLRGPRGGLILTND lA 
Sbjct: 186 EIADEVGAYLMVDMAHEAGLVAAGIHPNPLEYAHVVTSTTHKTLRGPRGGLILTNDPEIA 245 



Query: 249 KKINSAVFPGLQGGPLEHVIAAKAVALKEALDPSFKIYGEDIIKNaQaMAKVFKEDDDFH 308 
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Query: 309 LIStXSTDNHLFLVDTOKVIENGKKACJilVLEEWITlimCNSI 368 

+-1-S GTD HIiFLVD+T GK A+ IE IT+NKN+IP E+ SPF SGIRIGTPA 

Sbjot: 305 IVSGGTDTHLFLVDLTPKDITGK2iaEKMBSCMITVNKOTIPlffiKRSPFVASGIRIGTP 364 

Query: 369 ITSRGMGVEESRRIAELMIKALKN--HENQDVi:iTEVRQE IKSLTDAFPLYEN 418 

+T+RGM EE IAE++ L N EN V EVR+E ++ L + FPLY + 
Sbjct: 365 VTIKGMKEEE^ffiEIAEMIDLVDSWIDENGTVKPEVREEVSKKVRELCERPPLroD 420 

A related DNA sequence was identified in S.pyogenes <SEQ ID 656 1> which encodes the amino acid 
sequence <SEQ ID 6562>. Analysis of this protein sequence reveals the following: 

Possible site: 47 



Final Results 

bacterial membrane Certainty=0. 1001 (Affirmative) < suco 

bacterial outside --- Certaiiity=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certain.ty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB1S707 38:299122 serine hydroxymethyltransf erase [Bacillus subtilis] 
Identities = 250/407 (61%), Positives = 311/407 (75%), Gaps = 2/407 (0%) 

Query: 14 DKELWDAIHAEEERQEHHIELIASENWSKAVMTU^QGSVLTNKYAEGYPGNRYYGGTKCV 73 

D4-++++AI E ERQ+ lELIASEN VS+AVM AQGSVLTNKVAEGYPG RVYGG E V 
Sbjct: 8 DEQVBTOIKNERERQQTKIELlASENFVSEAVMEaQGSVLTNKXAEGYPGKRYYGGCEHV 67 

Query: 74 DIVETIjAIERAKKl,FG2^ANVQ%HS6SQAISIAAATniALIEAGDn^^ 133 

D+VE +A +HAK++PGA NVQ HSG+QSN A Y ++E 001^^+1.+ GGHLTHGS 
Sbjct: 68 DVVEDIAia)RAKEIFGaEHVNVQPHSG?iQ2VNMAVYFTIi:iEQC3DTV^ 127 

Query: 134 PVNFSGKTyHFVGySVDTDTEMENYEAILEQaKAVQPKLIVaGaSAYSRSIDPEKFRAIA 193 

PVNFSG Y+FV Y VD +T+ ++Y+ + E+A A +PKIiIVAG2VSAY R+IDF+KFR lA 
Sbjct: 128 PVNFSGVQYNFVEYGVDKETQYIDYDDVREKflLaHKPKl.IVAGftSAYPRTIDFKKFREIA 187 

Query: 194 DHVGAYLM\'DMAIIIAGLVAAGVHPSPVPYflHIVTSTTHKTLRGPRGGLILTNDEALAKKI 253 

P VGRY MVDMAHIAGLVAAG+HP+PVPYA VT+TTHKTLRGPRGG+IIi +E KKI 
Sbjct: 188 DEVGAYFMVDMAHIAGLVAAGLHPNPVPYADFVTTTTHKTLRGPRGGMILCREE-FGKKI 246 

Query: 254 NSAVFPGLQGGPLEHVIAAKAVAFKEALDPAFKDYAQAIIDNTAflMflAVFAQDDRFRLIS 313 

+ ++FPG+QGGPL HVIAAKAV+F EL FK YAQ +1 N +A ++ +L+S 
Sbjct: 247 DKSIFPGIQGGPIMmAftKAVSFGEVI^DFKTYAQNVISNaKRLAEALTKEG-IQLVS 305 

CJuery: 314 GGTDNHVFLVDVTKVIANGKIiAQNLIiDEMlTIJnCNAlP 373 

GGTE«NH+ LVD+ + GK+A+++LDE+ IT NKMAIP++ PF TSSIR+G AA+TS 
Sbjct: 306 GGTDNHLlLVDIJlSLGLTGKVaEirVLDEIGITSNKNAIPYDPEKPFVTSGIRLGTRAV^ 365 

Query: 374 RGMGVKESQTIARLIIKALVNHDQETItEEVRQEVRQLTnRFPLYKK 420 

RG + + +1 AL NH+ E LEE RQ V LTD FPLYK+ 

Sbjct: 366 RGFDGDALEEVGAIIALRLKNHEDEGKLEERRQRVAJiLTDKFPLYKE 412 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 330/417 (79%) , Positives = 368/417 (88%) 

Query: 1 MIFDKnNFKEFDQELWQAIHDEEIRQ(3SINIELlASENWSKAVMRAQGSVLTI!i^ 60 

MIFDK N ++FD-1-ELW AIH EE RQ+++IELIASEN+VSKAVMaAQGSVLTNKyAEGYP 
Sbjct: 3 MIFDKGNVEDFDKELWDAIHaEEERQEHHIELIASENMVSKA\n«iaftQGSVLTNKYflE6YP 62 



Query: 61 SHRYYGGTOCVDVVESLAIEKAKTLFmEFANVQPHSGSQMIAAAYMALIEPGDTVLGMD 120 
+RYYGGT+CVD+VE+LAIERAK LF A PANVQ HSGSQANaAAYMALIE GDTVLGMD 



wo 02/34771 



PCT/GBOl/04789 



-2395- 



Sbj Ct : 


63 


GITOYYGGTECVDIVETLAIERAKKLFGAA?AWQ'M;SGSQANAZyiyi»ffiI,IEAGDTVLC3MD 


122 


Query: 


121 


IiftAGGHLTHGaSVSFSGKTYHFVSySVDPKTEMLDYDNILKiaQBrQPKLIVAGASA.YSR 


180 






IAAGGHLTHG+ V+FSGKTYHFV YSVD TEML+y+ IL+ A+ QPKLIVAGASAySR 




Sbjct: 


123 


IAaGGHLTHGSPVNFSGICI™FVGYSVDTDTE^mIYEAILEQRKaVQPKLIVaG^^^ 


182 


Query: 


181 


IIDPEKFRQIAimVnA.YLM\nmHlAGI.VASGHHPSPIPYAHmTTTHKTIiRGPROT^ 


240 






IDFEKFR IKD V AYLMVDMflHIAGLVR+G HPSP+PYAH+ T+TTHKTLRSPRGGLI 




Sbjct: 


183 


SIDFEKFRAIJmHVGAYLMVDMaHIAGLWiAGVHPSPVPYAHIVTSTTHKTLRGPRSSIliI 


242 


Query: 


241 liTlimmiRKKllSSAVF^iaSGPLWmi^^ 


300 






LTNDEA+AKKINSAVFPGLQGQPLEHVIAAKAVA KEALDP-HFK Y + 11 N AMA V 




Sbjct- 


243 


LTtroEBIAKKINSAVFPGLQGGPLEHVIAAKAVAFKEAXJDPAFKDXRQAIIDOTAA^ 


302 




301 FKEDDDFHLISDGTDNHLFLVDVTKVIEiraKKaQlWLEEVNITIJ!^ 


360 






F +DD F LIS GTUNH+FIiVDVTKVI NGK AQN+L+EVNITLNKN+IPFE LSPFKTS 




Sbjct: 


303 


FAQDDRFRLISGGTDNHVFI.VDVTKVIANGK]:AQNLLDEWITriNKmiPFET^^ 


362 


Query: 


361 


GIRIGTPAITSRGMGVEESRRIAEIiMIKALKNHENQDVLTEVRQEIKSIiTDAFPr.YE 417 






GIRIG AITSRGMGV+ES+ lA L+IKAL NH+ + +L EVRQE++ LTDAFPLY+ 




Sbjct: 


363 


GIRIGOUilTSRCSIGVKESQTiaRLIIKaLVNHDQETIIiEEVRQEVRQLTnAFPLYK 419 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 2126 

A DNA sequence (GBSx2242) was identified in S.agalactiae <SEQ ID 6563> which encodes the amino 
acid sequence <SEQ ID 6564>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm — Certainty=0. 2289 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

A related GBS nucleic acid sequence <SEQ ID 9839> which encodes amino acid sequence <SEQ ID 9840> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35934 GB:AE001752 conserved hypothetical protein [Thermotoga maritima] 
40 Identities = 71/198 (35%) , Positives = 114/198 (56%) , Gaps = 4/198 (2%) 







MNDLGQILEDHGAVIMPTETVYGIFAKALSEEAVNHVYEDKKRPRDKAMN^ 


60 






+ + ++L + +1 PTETVYGI A A +EEA +++LK+RP D + ++I F+ + 




Sbjct: 


17 


LKEAAELLRNGEVIIFPTETVYGIGADAYNEEACKKIFKLKERPADNPLIVHIHSFKQLE 


76 


Query: 


61 


KYSKNQPTYLKQLYDAFLPGPLTIIL-EASQEVPHWINSGLLSVGFRMPKHPVTLDMIAN 


119 






+ ++ +L L F PGPLT+I + S+++P + + L +V RMP HPV h +1 




Sbjct: 


77 


EIAEGYEPHLDFL-KKFWPGPLTVIFRKKSEKIPPWTADLPTVAVRMPAHPVALIOjIEL 


135 




120 


HG-PLIGPSANISGCDSGRVFSEIQKQFNHQV-LGIEDDKRLTGVDSTIIDLSGDRVKIL 


177 






G P+ PSANISG S + + F +V L I+ G++STI+DL+ ++ +L 




Sbjct: 


136 


FGHPIAAPSANISGRPSAraVKHVIEDFMGKVKLIIDAGDTPFGLESTIVDLTKEKPVLL 


195 




178 


RQGAITQEVLTATIPELI 195 








R G + EL PEL+ 




Sbjct: 


196 


RPGPVEVERLKELFPELV 213 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6565> which encodes the amino acid 
sequence <SEQ ID 6566>. Analysis of this protein sequence reveals the following: 
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3 N- terminal signal sequence 

— --- Final Results , . 

5 bacterial cytoplasm — Certainty=0. 0282 (Affirmative) < succ> 

bacterial membrane — Certainty=0. 0000 (Not Clear) < succ? 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 127/196 (64%) , Positives = 154/196 (77%) 

Query: 1 ^®roLGQILEDHGRVIMPTETTOGIFAKALSEEAVNHVYELKKRPRDKAMNLNICDFETIL 60 

M L I+E A+++PTET7yG+FAKAL E+AVN VY+LK+RPRDKAMNLN+ DP +IL 
Sbjct: 11 »ffiWLASIIESGI»LVIiPTBTWGLFfiKAI£)EK&VNAVYDLK^ 

15 

Query: 61 KYSKNQPTYLKSL-XDAFLPGPLTIIiaASQEVPHWINSGLLSVGFRMPKHPVTIJJMIA^ 120 

+SK QP TLK+LY AFLPGPriTIlL+A+ +VP+WINSGL +VGFR+P HP+T +1 
Sbjct: 71 AFSKEQPRYLKKLyQAFLPGPLTIILKRNDQVPYWINSGLSTVGFRLPSHPITAALIQKT 130 

20 Query: 121 GPLIGPSANISGCDSGRVFSEIQKQFHHQVLGIEDDKaLTGVDSTIIDLSGDRVKILRQG 180 

GPLIGPSAN+SG SGRVF I + F+ QV G DD LTG DSTI+DLSG+R ILRQG 
Sbjct: 131 GPLIGPSANLSGKASGRVFDHIMQDFDFQVFGYADDPFLTGKDSTILDLSGERAVILRQG 190 

Query: 181 AITQEVLTATIPELIF 196 
25 AIT+E L A +EEL F 

Sbjct: 191 AITKEELLJOTTPELRF 205 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 2127 

A DNA sequence (GBSx2243) was identified in S.agalactiae <SEQ ID 6567> which encodes the amino 
acid sequence <SEQ ID 6568>. This protein is predicted to be protoporphyrinogen oxidase (hemK). 
Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0.3000 (Affirmative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07493 GB:AP001519 protoporphyrinogen oxidase [Bacillus halodurans] 
Identities = 94/236 (39%) , Positives = 132/236 (55%) , Gaps = 12/235 (5%) 

Query: 49 DTDQQLMENIFQQLKKHRSP---QYITGKAYFRDLIFFVDERVLIPRPETEELVDLILSE 105 

+ D+L + + + L HS Q++ G F F VD+ VLIPRPETEELV +L E 
Sbjct: 46 ELDGELFQRLEEDLAAHASGVPVQHLIGVESFYGRQFQVDQHVLIPRPETEELVLAVLKE 105 

Query: 106 NKVEDCSVLDIGTGSGAIAISLKKERPSWDVLASDISVSALDLAKENANNCDAEV 160 

K E+ ++LDIGTGSGRIA++L E +V A DIS AL +A +NA A V 
Sbjct: 106 IRRQFKKEEEITIIJDIGTGSGAIAVTlja.EEERIWrA\n3ISRDALQVASDHARRLGANV 165 

Query: 161 TFIESDV---FSNIS6KPDIIVSNPPYISm)KDEVGKNVIiASEPHSALFADEEGIiAiyR 217 

I D+ F +FD+IVSNPPYI +KD + +V EP ALF +GL +YR 

Sbjct: 166 QLIHGDIiSEPFLKTOERFDVIVSNPPYXPTVEKOTIAUHVKDHEPAIALFGCSVaStDVTO 225 

Query: 218 KIIENSREYL-QPRGKLYFEIGYKQGDDLRSJjIiKRYFPNMRCRVLKDIEGKDRMW 272 

+++ + +G .+ EIG QG D+ L+4- +P VL D+ GKDR+V+ 

Sbjct: 226 RmSQLPALTKEEKGMVAMilGAGQGMDVEKI^IomYPKAAVDVLYDUq^GKDRIVI. 281 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6569> which encodes the amino acid 

sequence <SEQ ID 6570>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

- Final Results 

bacterial cytoplasm — Certainty=0 .4324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 174/274 (63%) , Positives = 207/274 (75%) 

15 Query: 1 MNY^AQLIKHYGQLLEACGEEWENFIYVLKDLKQWSTTDYLnNQNSSVSDTDQQLMENIFQ 60 

MNYA 1.1+ y IiE E+ EN YV +++K+WS+ D Ii-(-+QN +V+ D L+E+IF 
Sbjct: 1 MNyATtilRTYEDKLEQIDEDRBNLAYVFREIKEWSSLDMLIHQNQaOTPEtaVLL^^ 60 



(3uery: 61 QLKKHRSPQYITGKAYFRDLIPFVDERVLIERPETEELVDLII£H(nWEDCSVLDIGIGS 120 
20 L +H SPQYITG AYFRDL VD+RVLIPRPETEELVD+IIrt-EN +VLDIGTGS 

Sbjct: 61 SLSQHLSPQYITGKaYFRDLKLAVDKRVLIHRPETEELVDMIMENLDAPIJiraj^ 120 

Query: 121 GAIAISIiKKERPSWDVLASDlSVSALDIjftKENBNNCn^^ 180 
GAIAISIiKKERP+W V ASDIS +ALDLAK NA+ H-l-TFIESDVFS IS FDIIVS 
25 Sbjct: 121 GAIAISmKEREtWQVTASDlSRflaiJJLAKaNAnAYQLDITPIESDVFSLISETFDIIVS 180 

Query: 181 NPPYISYNDKDEVGKNVLASEPHSALFADEEGLAIYRKIIENSREYLQPRGKLYFEICSYK 240 

NPPYISY DK+EV NVL SEPH ALFA E G AIYRKIIE + YL GKLYFEIGYK 
Sbjct: 181 NPPYISYEDKEEVSIJmiQSEPHIJUJFAKENGYAIYRKIIEQADNYLTKEGia.YFEIGYK 240 

30 

Query: 241 QGDDLRSLLKRYFENMRCRVLKDIFGKDRMWLD 274 

Q + ++ +L+ YFP R + DIFGK+RMW+D 
Sbjct: 241 QAEGIKDMLQAYFPQRHIRAVTDIFGKERMWVD 274 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens fl 



Example 2128 

A DNA sequence (GBSx2244) was identified in S.agalactiae <SEQ ID 6571> which encodes the amino 
acid sequence <SEQ ID 6572>. This protein is predicted to be peptide chain release factor RF-1 (prfA). 
40 Analysis of tliis protein sequence reveals the following: 
Possible site: 28 

>» Seems to have no N-termina.1 signal sequence 

Final Results 

45 bacterial ■ cytoplasm — Certainty=0 . 3446 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:CaB15718 GB:Z99122 peptide chain release factor 1 [Bacillus subtilis] 

Identities = 211/351 (60%), Positives = 280/351 (79%), Gaps = 1/351 (0%) 

Query: 5 DQLQAVEDRYEELGELLSDPDWSDTKREMELSREEASTRETVTAYREYKQVIQNISDAE 64 
DH-L+++E+RYE+L ELLSDP+W+D K+ E S+E++ +ETV YR+Y+ + ++DA+ 
55 Sbjct: 3 DRIiKSIEERYEKIiIffiLLSDPE\AnroPKia^REYSKEQSDIQETVDVYRQYRDASEQIM 62 

Query: 65 EMIKDASGnAELEEM&KEELKESKAaKEEYEERLKILLLPKDPNDDKNIILEIRGAAGGD 124 
M+++ naE+ +M KEE+ E + E ERLK+LL+PKDPNDDKN+I+EIRGAAa3+ 
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Sbjct: 




Query. 


125 


Sbjct: 


122 


Query: 


185 


£33 jet: 


182 


Query: 


245 


Sbjct: 


242 


Query: 




Sbjct: 


302 



-2398- 

AMLEEKL-DJiEMRDMVKEEISEIjQKEaETLSERLKVIjLIPKDPiroDKWI^^ 121 

EAMiFAGDLLTMyQKYAETQGWRFE\mSSWGVGGIKE^AffiMVS(MSVYSKLKYES(^ 184 
EAALFAG+L MY +YAE Q(3W+ EVME++V G GG KE++ M++6 YSKLKYE+GAH 
EflALFAGNLYRMYSRYAELQGWKTEVNEMNVTGTGSYKEI I FMITGSGAYSKLKYENGAH 181 

RVQRVPVTESQGRVHTSTATVLVMPEVEEVEYEIDQKDLRVDIYHASGfiOSQiraiKVATA 244 
RVQRVP TES GR+HTSTATV +PE EEVE +1 +KD+EVD + +SG GGQ+-ra +A 



VR+ H+PTG+ V Q+E++Q KN++KAMK++EAR+ D F Q AQ E D RKS VG+GDR 
VRIiTHLPTGVWSCQDEKSQIKNKEKaMKVLRARIYDKFQQEAQAEYDQTRKSAVGSGDR 301 

SERIRTYNFPGNRWDHRI6LTriQK):£ITILSGKMDEVIDALVMYDQTQKi:i^ 355 
SERIRTYNFPQNRVTDHRieLT+QKLD IL GK+DEV++AL++ DQ KL+ 
SBRIRTYNFPQNRVTDHRIGLTIQKIjDQILEGKLDEVVEftLIVEDQRSKLQ 352 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6573> which encodes the amino acid 
sequence <SEQ ID 6574>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty^O. 3446 (Affirmative) < suco 

bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outsicJe — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 349/358 (97%) , Positives = 354/358 (98%) 

Query: 1 ^INIYDQl:.QaVEDRYEEIfiELriSDP^WSDTKRPMEl:JSREEASTRETVTAyREYK]QVIQIS^: 60 

MNIYDQLQAVBDRYEELGEriSDPDWSDTKRPMEIiSREE +TRErVTAYREYKQVIQ I 
Sbjct: 1 MNITDQI^VEDRYEELGELLSDEDWSDTKRFhffil^REEXNTRETVTAYREYKQVIQTI 60 

Query: 61 SDAEEMIKDASGDAELEEMAKEXZiKBSKAAKEEYEERLKILLLPKDPtmCKN 120 

SDAEEMIKDASGD EI^EMAKEELKESKAAKEEYEE+LKILLLPKDPNDDKNIILEIRGA 
Sbjct: 61 SDAEEMIKDftSGDPELEEMAKEELKESKAAKEEYEEKLKlLLLPKDPNDDKNIILEIRGA 120 

Query: 121 AGGDERALFflflDLLTMYQKSAEnQGWRFEVMESSVIKVGGIKEVVAMVEGQSVYSKL 180 

AGGDEAftLFAGDLLTMYQKXAETQGWRFEVMESSV^K3VGGIKEVVAIWSGQSVYSKrJKYE 
Sbjct: 121 AGGDEAALFAGDLLTMYQKyAETQGlffiFEVMESSVNGVGGIKEVVaMVSGQSVYSKLKYE ISO 

Query: 181 SGAHRVQRVPVTESQGRVHTSTATVLVMPEVEEVEYEIDQKDLRVDIYHASGAGGQNVNK 240 

SGAHRVQRVPVTESQGRVHTSTATVLVMPEVEEVEY+ ID KDLRVDIYHASGAGGQNVKK 
Sbjct: 181 SGa^VQRVPVTESQGRVHTSTATVLVMPEVEEVEYDIDPKDLRVDIYHRSGAGGQNV^ 240 

Query: 241 VATAVRMVHIPTGIKVEMQEERTQQKNRDKaMFaiRARVaDHFAQIAQDEQDAERKSI^ 300 

VATAVRMTOIPTGIK^MQEERTQQKNRDKAMOIRflEVADHPAQIAQDEQDAERK^ 
Sbjct: 241 V&TAVR^WHIPTGIroffiMQEERTQQKNRDKAMtaIRARVM)HFAQIAQDEQnAERKSTVG 300 

Query: 301 TGDRSERIRTYWFPQNRVTDHRIGLTLQKLDTII£GK^roEVIDALV^raJQTQKLEAUI 358 

TGDRSERIRTYNFPQNRVTDHRIGLTLQKLDTILSGKMDEVIIffiLVMYDQT+KLE+LN 
Sbjct: 301 TGDRSERIRTYWFPQNRVTDHRIGLTLQKLDTILSGKMDEVIDALVMYDQTKKLESLN 358 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2129 

A DNA sequence (GBSx2245) was identified in S.agalactiae <SEQ ID 6575> which encodes the amino 
acid sequence <SEQ ID 6576>. This protein is predicted to be thymidine kinase (tdk). Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty>-0. 2244 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=o . OOOO (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9841> which encodes amino acid sequence <SEQ ID 9842> 
was also identified. 

1 5 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MAQLYYKYGTIWSGKTIEILKVAHNYEEQGKPWIMTSALDTRDEFGWSSRIGMRREAV SO 

MAQLYYKYGTMNSGKTIEILKVAHNYEEQGK WIMTSA+DTRD G VSSRIGM+R+A+ 
Sbjct: 1 MAQLY¥KYGT^WSGK3'IEILKVAHNYEEQGKGWIMTSAVDTKIX;VGYVSSRIGMKRQaM SO 

Query: 61 PISDDmiFSYIQNLPQKPYCVLIDECQPLSKKNVYDLARVVDDriDVPVMAFGLKNDFQN 120 

I DD DI YI+NLP+KPYC+LIDE QFL + +VYDIiRRVVD+IiDVPVMJ^FGLKISIDF+N 
Sbjct: 61 AIEimTDII/3YIKNLPEKPYCIIiIDEJiQFLKRHHVYDljftRVVDEIiDVPVMAFGLK]!^ 120 

Query: 121 NLFEGSKHLIlIlLADKIDEImCQYCSKKAT^mlRTENGKI>VYEGDQIQIG(a^ 180 

LFEESKHLIJJLADKI+EIKTICQYCS+KAT^TOlRT^-+GKPVY+G+QIQIGGaSlETyIFVC 
Sbjct: 121 ELEEGS^a^IlLIiaDKIEEIKTICQYCSRKAT^mJRTDHSKI>VYDGEQIQI(3GB^ 180 

CJuery: 181 RKHYENPDI 189 

RKHYF PDI 
Sbjct: 181 RKHYFKPDI 189 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6577> which encodes the amino acid 
sequence <SEQ ID 6578>. Analysis of this proteiii sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2244 (Affirmative) < suco 

bacterial nienibrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 174/189 (92%) , Positives = 184/189 (97%) 

Query: 1 ^mQLYYKyGTMNSGKTIEILKVaHNYEEQGKPVVIMTSALDTRDEFGVVSSRIGMRREAV 60 

+AQLYyKYGTMNSGKTIEILKVAHNYEEQGKPVVIMTSaLDTRD FG+VSSRIGMRREA+ 
Sbjct: 1 UVaLYYKYGTMNSGKTIEILKyAHNYEEQGKFVVimSALDTRIWFGIVSSRlGMRREa 60 

Query: 61 PISDD^mIFSYIQ^^:,PQKPyCVLIDECQFLSKK^IVYDLaRVVDDIlIm'VMaFGLK^ 120 

PIS+DMDIP++I L +KPYCVLIDE QPLSK+NVYDLRRWD+L+VPVMAFGLKNDFQN 
Sbjct: 61 PISirorroiFTFIAQLEEKPYCVLIDESQFLSKQNraSLaRVVDEIJIIVPVMRFGLKNDFQN 120 

Query: 121 KLFEGSKHLLLtiaDKIDEIKTICQYCSKK&TMVLRTENGKPVYEGDQIQIGGNETYIPVC 180 

NLFEGSKHLLLLADKIDEIKTiaQYCSKKATMVlRTENGKPVYEGDQIQIGGNETYIPVC 
Sbjct: 121 NLFEGSKHLLLIMKIDEIKTICQYCSKKRTMVLRTENGKPVYEGDQIQIGGNETYIPVC 180 



wo 02/34771 



-2400- 



PCT/GBOl/04789 



.Query: 181 RKHYFWPDI 189 

RKHYFHPDI 
Sbjct: 181 RKHYFNPDI 189 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2130 

A DNA sequence (GBSx2246) was identified in S.agalactiae <SEQ ID 6579> which encodes the amino 
acid sequence <SEQ ID 6580>. Analysis of this protein sequence reveals the following: 

10 Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty^O. 3995 (Affirmative) < suco 

15 bacterial membrane Certainty^O. 0000 (Not Clear) < succ> 

bacterial outside Certainty=0. GOOD (Not Clear) < suco 

The protein has homology with tlie following sequences in the GENPEPT database. 

>GP:JUiA26046 GB:M95650 4-oxalocrotonate tautomerase [Plasmid pWWO] 
20 Identities = 27/60 (45%) , Positives = 36/60 (60%) 

Query: 1 MPFVKIDLFEGRSQEQKNELAREVTEWSRIAKAPKENIHVFINDMPESTYYPQGELKKK 60 

MP +1 + EGRS EQK L REV+E +SR AP ++ V 1 +M +G + GEL K 
Sbjct: 1 MPIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEIIAKGHFGIGGEIASK 60 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 658 1> which encodes the amino acid 
sequence <SEQ ID 6582>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 4 12 8 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty-0 .0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 56/60 (93%) , Positives = 59/50 (98%) 

QueiY: 1 MPFVKIDLFEBRSQEQKNELRREVTEVVSRIAKAPKENIHVFIHDMPEGTYYPQGELKKK 60 
40 MPFV IDLFEGRSQEQKN+LAREVTEWSRIAKRPKENIHVFINDMPEGTyYPQGE+K+K 

Sbjct: 1 MPFVTIDLFEBRSCJEQKNQLAREVIEVVSRIAKRPKENIHVPIHDMPEGTYYPQGEMKQK 60 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefid antigens for 
vaccines or diagnostics. 

45 Example 2131 

A DNA sequence (GBSx2247) was identified in S.agalactiae <SEQ ID 6583> which encodes the amino 
acid sequence <SEQ ID 6584>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>» Seems to have no N-terminal signal sequence 

50 

Final Results 

bacterial cytoplasm — - Certainty=0. 2154 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9843> which encodes amino acid sequence <SEQ ID 9844> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP!flAC65759 GB:AE0012S0 conGerved hypothetical protein [Treponema 
pallidum] 

Identities = 103/317 (32%) , Positives = 163/317 (50%) , Gaps = 15/317 (4%) 



Query: 


7 


Sbjct: 


31 




63 


Sbjct: 


91 


Query: 


123 


Sbjct: 


151 


Query: 


178 


Sbjct: 


211 


Query: 


232 


Sbjct: 


271 


Query: 


292 


Sbjct: 


331 



+V L Q GM++DLC3A+AKG++ADKI+ L +DSA+++L 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6585> which encodes the amino acid 
sequence <SEQ ID 6586>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1020 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 182/310 (58%) , Positives = 232/310 (74%) 

LSHSLRMGTTIDIQINSKNaQKQIRBVIELLBLYiaQRFSJiNDFNSEIJMNNNaGIKPI 67 
++ L+LMGT IDIQI S A 4.0+ VI+LL YKNRFSRND NSEMRIN AG+KP-I- 
VTQQLKLMGWIDIQlESDKACQQLSRVIDLLYTYKNRFSflNDSNSELMAINQaflGVKPV 52 



Query: 


8 


Sbjct: 


3 


Query: 


68. 


Sbjct: 


63 


Query: 




Sbjct: 


123 




188 


Sbjct: 


183 


Query: 


248 



+ IGIQ P KRG+++6 +K+ N SWTSG YER+ K+YHHI DRQTGy 

?FRIGIQKPDRKRGQHI/SVIKVmHSVVTSGIYERQFTSl(BKQIHHlLDRQrSY 242 

Query: 248 PlQTEMASISIVSKQSVDCEIimKLPGLSIKEaLDILNAVSYIEGIIITKDDRIYIiSDG 307 
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PI+T+M S++I++ S C+IWITRLFGI) + H-LN IEG+++T+ + +S+G 
Sbjct: 243 PIETDMLSLTIMAPSSBYCDIWTTELFGLDSSMIITIiliNTFDNIEGLLVTRKHHVLMSNG 302 

Query: 308 LKHHFQLFYH 317 
5 L+H+FQ +yH 

Sbjct: 303 LRHYFQPYYH 312 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 2132 

A DNA sequence (GBSx2248) was identified in S.agalactiae <SEQ ID 6587> which encodes the amino 
acid sequence <SEQ ID 6588>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-tenninal signal sequence 

15 

Final Results 

bacterial cytoplasm Certaintyao. 0956 (Affirmative) < suco 

bacterial membrane CertaintyO. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:ftAG18632 GB:AY007504 unknovm [Streptococcus mitis] 
Identities = 92/160 (57%) , Positives = 119/160 (73%) , Gaps = 1/160 (0%) 

25 Query: 1 MKLIGIVGTNSNKSTNRQLLQyMQQHFADKAEIELIEVKDLPLFNKPADKNVPQVILDIA 60 

MKL+ IVGTNSN+STNR+LL++MQ+HF+DKA+IE++E+K LP FN+P D+ P + 4 
Sbjct: 1 MKIjVRIVGTNSmiSTireKU^KFMQKHFSDKaDIEVLEIKQLPAENEPEDEQ2«>AEVC3AFS 60 



(Juery: 61 AKIEETDGVIIGTPEYDHSIPSAmSVLAISn^GIYPmsiKPVMITGASYGTLGSSI^ 120 

KI DGVII TPEYDH+IP+ L S L W++Y L+NKP m GAS G LG+SRAQ 
Sbjct: 61 EKILAADGVIISTPEYDHTIPAPLASaLEWlAYTSRALINKPTMIVGASLGLLGTSRAQA 120 

C2uery: 121 QLRQILNAPELKASVLP-DEFLLSHSLQAFDKDGNLHDIE 159 

LRQIJj+APELKA V+P EF L HS Q D + +L++ E 
Sbjct: 121 HIiRQII£lRPELKARVMPGTEFFMHSEQWLDEJECHIitINPE 160 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6589> which encodes the amino acid 
sequence <SEQ ID 6590>. Analysis of this protein sequence reveals the following; 

Possible site: 24 
>» Seems to have an \mcleavable N-term signal seq 



- Final Results 

bacterial mettibrane Certaintys^O .0000 (Not Clear) < e 

bacterial outside --- Certaintyi=0 . 0000 (Not Clear) < e 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < e 



The protein has homology with the following sequences in the databases: 

>GP:CAB62679 GB:AL133422 putative Secreted protein. [Streptonyces 
50 coelicolor A3(2)] 

Identities = 68/192 (35%) , Positives = 94/192 (48%) , Gaps = 25/192 (13%) 

Query: 4 ILFIVGSIJlEGSFMHQIiAAQAQK-AI.EHQAVVS™!mKDVPVIjNQDIEANA 60 
IL +VGSLR GS N QLA A + A E V + ++P N+D1+ +P A 
55 Sbjct: 5 IIALVGSLRAGSHNRQIAEAAVRFAPEGaEVQLFEGLAEIPPYNEDIDVEGSVP^ 64 



Query: 61 RQAVQSAnAIWlFTPVYNFSIPGSVKNLr£)WLSRAIiDLSDPTGPSAIGGKVVTVSSWANG 120 

R+A Q A A +F+P ra +IP +KH +DWLSR P6AGKVV AG 

Sbjct: 65 REAAQGAQAFLLFSPEYNGTIPAVLKNAIDWLSR PYGAGAFTGKPVAWGTAFG 118 
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Query: 121 GHDQVFDQFKft. I1I.PFIRTSVAGEFTK-ATVNP--DAWGTGRLEISKETKA 167 

+ V+ Q +A ++ 1+ S+ G T+ A +P DA - +L E A 

Sbjct: 119 QYGGWAQDEARKAVGIAC3GKVIEDIKLSIPGSVTREAETHPADDAEVaAQL---TEVVA 175 

5 

Query: 158 NLLSQAEALLAA 179 

L A+ +AA 
Sbjct: 176 RLHGHADEAIAA 187 

1 0 An alignment of the GAS and GBS proteins is shown below. 

Identities = 28/90 (31%) , Positives = 49/90 (54%) 

Query: 3 LIGIVGTNSNKSTNRQLLQYMQQHFADKAEIELIEVKDLPLFNKPADKNVPQVILDIAAK 62 
++ IVG+ S N QL Q+ +A + + KD+P+ N+ + N P ++D 
15 Sbjct: 4 ILFIVGSLREGSEtffiQIAAQAQKALEHQRWSYUWKDVPVENGDIEaNAPLFVVDft^ 63 

Query: 63 lEETDGVIIGTPEYDHSIPSALMSVLAWLS 92 

++ D + I TP Y+ SIP ++ ++L WLS 
Sbjct: 64 VQSADAIWIFTPVYNFSIPGSVKNLLDWLS 93 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2133 

A DNA sequence (GBSx2249) was identified ui S.agalactiae <SEQ ID 6591> which encodes the amino 
25 acid sequence <SEQ ID 6592>. Analysis of this protein sequence reveals the foUowmg: 
Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 1160 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiaty=C. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
vaccines or diagnostics. 

Example 2134 

A DNA sequence (GBSx2250) was identified in S.agalactiae <SEQ ID 6593> which encodes the amino 
40 acid sequence <SEQ ID 6594>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm — Certainty=0 .2132 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with Ihe following sequences in the GENPEPT database. 

50 >GP:AAG18632 GB:AY007504 tinknown [Streptococcus mitis] 

Identities = 80/162 (49%) , Positives = 112/162 (68%) 



Query: 1 MKFVGIVGSNAEQSYNEMLLEFIRKNFKTKFEIOTLEIDDIPMraQDQNWEESFQIjRLLN 60 
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Query: 51 NKlTRADGVIIATPEHiraTITiUiLKSVIiEWIiSFAVHPLENKPVMIVGRSYYDQGTSRAQI 120 

KI ADGVII+TPE++HTI'A L S LEW+++ L NKP MIVGAS GTSRAQ 
Sbjct: SI EKILAADGVIISTPEYDHTIPAPIASMiEmiAYTSRRLINKPTMIVGASLGLLGTSRAQA. 120 

Query: 121 HLRKILDAPCSVNAYTLPGNEFLLGKftKEAFDDNGNIlNPGTV 1S2 

HLR+ILDAP + A +P6 EF LG +++ DD ++ MP V 
Sbjct: 121 HLRQILnAPELKftEVMPGTEFFLGHSEQVLDDBCSiaiNPEKV 162 

There is also homology to SEQ ID 6596. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines' or diagnostics. 

Example 2135 

A DNA sequence (GBSx2251) was identified in S.agalactiae <SEQ ID 6597> which encodes the amino 
acid sequence <SEQ ID 6598>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.32 Transmembrane 13 - 29 ( 11 - 29) 



Final Results 

bacterial meinbrane — Certainty=0. 3 930 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial oytqplasm - — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2136 

A DNA sequence (GBSx2252) was identified in S.agalactiae <SEQ ID 6599> which encodes the amino 
acid sequence <SEQ ID 6600>. This protem is predicted to be potential nitrite transporter. Analysis of this 
protein sequence reveals the following: 
Possible site: 42 



> Seems to have no N-terminal signal sequence 










INTEGRAL Likelihood = -9.92 Transmembrane 


61 


77 


54 


82 


INTEGRAL LiJielihood = -5.57 Transmembrane 


106 


122 


103 


126 


INTEGRAL Likelihood = -5.15 Transmembrane 




176 


159 


177 


INTEGRAL Likelihood = -4.09 Transmembrane 


180 


196 


179 


199 


INTEGRAL Likelihood = -1.01 Transmembrane 


233 


249 


233 


249 



Final Results 

bacterial membrane Certainty=0 .4970 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) <; suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15832 GB:Z9gi23 alternate gene name: ipa-48r~similar to 
nitrite transporter [Bacillus subtilis] 
Identities = 82/253 (32%) , Positives = 119/253 (46%) , Gaps = 10/253 (3%) 

Query: 6 EKIAYNCAKKEALYKESLGRYALRSMLAGAYLTMSTAAGIVAADTIGK-ISPALSGFVF- 63 
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Query: 64 --AFIFSFGLIYVLIFNGEMiTSNNIiTLTAGAYNKNISWKKaMTILiycrFENLVGACIL 121 

A F ++ + G+L T N Y T A K ISW+ + + + NL+GA + 

Sbjct: 63 AaAVTFGAAIIWIAYGGGDLPIGNTi^FTYTALRKKISVmDTLYLWMSSYAGNLI^^ 122 



Query: 182 KMTVILSAIFMFVFLSNEHLIANFASFMLAAFSHIEHIKGFTLLNriRQWTLVFFGNWIG 241 

K+ ++ +F F EH IAN +F ++ lEH TL+ +R V GN 

Sbjct: 183 KBFTmLFVFCFFlSGFEHSIANMCTFAISIjI)--IEHPDTVTLMGAVRNLIPVTLGNLTA 240 

Query: 242 GGVFIGLAYAWLN 254 

G V +G Y LN 
Sbjct: 241 GIVMMGWMYYTIiN 253 



A related DNA sequence was identified in S.pyogenes <SEQ ID 660 1> which encodes the amino acid 
sequence <SEQ ID 6602>. Analysis of this protein sequence reveals the following: 



Possible site: 32 
» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
INTEGRAL Likelihood - 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytpplasm - 



- Certainty=0. 4906 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 
■ Certainty=.0. 0000 (Not Clear) < i 



35 The protein has homology with the following sequences in the databases: 





36 


Sbjct: 






96 


Sbjct: 


64 




156 


Sbjct: 


124 


Query: 


216 


Sbjct: 


184 




276 


Sbjct: 


244 



A+ F6H +GLT 



h E+ I VA K+ S 



L+S IGCaSW V LA+VO. +GA DAA 



K; LG WFP+M FVA+GFQH VBN FVIPAAIF G TW F+ N I + GN+IGGA+F 



V +YF Y+ 



An alignment of the GAS and GBS protems is shown below. 

60 Identities = 69/240 (28%) , Positives = 101/240 (41%) , Gaps = 18/240 (7%) 

Query: 15 KEALYKESLGRYALRSMLAGAYLTMSTAAGIVAADTIGKISPALSGFVFAFIFSFGLIYV 74 

KLKLG +GL+AA +TG ASVAPGLI + 
Sbjct: 55 KTFLAKSILGFIGGAMISLGYLLYVRIAAS--GLETFG AFSSIVGACAFPIGLIII 108 
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Query: 75 I.IFNGEIJA.TS^1MLYLTAGAYNKNISKKKAI•1TI1JIYCTFFNLVGACILAWL■FNQSYSFQHL 134 

L+ GEL T NM+ ++A K I + + + T FN++GA +A++F F L 

Sbjct: 109 L^aGGELITG^MmVSAMlIAKKIKFSEIAKZ^WLIIT]:■Fl^JVIGAVFVAFVFGH---FLGI: 165 

Query: 135 TNDSFLGHWAK- - - 

T+ V + 

Sbjct: 166 1 

Query: 191 FMFVFLSNEHLIANFASFIOyiPSHIBHIKGFTLmIIRQWTLVFFG^IWlGGGVFIGIJAY 250 

FV L +H +AW A F G T L+ + + V+ GN IGG +F+ Y 

Sbjct: 226 MTFVALGFQHSVANAFVIPAAIFE GGATWLDFVTNFIFVYSGNIIGGAIFVSFLY 280 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2137 

A DNA sequence (GBSx2253) was identified in S.agalactiae <SEQ ID 6603> which encodes the amino 
acid sequence <SEQ E) 6604>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1342 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2138 

A DNA sequence (GBSx2254) was identified in S.agalactiae <SEQ ID 6605> which encodes the amino 
acid sequence <SEQ ID 6606>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 44 - 60 ( 44 - 60) 



Final Results 

bacterial menibrane Certainty=0 . 1086 (Affiinoative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 
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Example 2139 

A DNA sequence (GBSx2255) was identified in S.agalactiae <SEQ ID 6607> which encodes the amino 
acid sequence <SEQ ID 6608>. This protein is predicted to be xanthine permease (pbuX). Analysis of this 
protein sequence reveals the following: 

Possible site: 23 



> Seems to 


have no N-terminal signal sequence 










IMTESRAL 


Likelihood = 


-7 


91 


Transmembrane 


160 


176 


156 


188) 


INTEGRAL 


Likelihood = 


-6 


48 


Transmembrane 




200 


179 


211 


INTEGRAL 


Likelihood = 


-6 




Transmembrane 


101 


117 


96 


121 


INTEGRAL 


Likelihood = 




04 


Transmembrane 


309 




306 


332 


INTEGRAL 


Likelihood = 




98 




334 


350 


331 


353 




Likelihood = 


-3 


88 


Transmembrane 


400 




396 


420 


INTEGRAL 


Likelihood = 


-3 


45 


Transmembrane 


19 


35 


18 




INTEGRAL 


Likelihood = 


-2 


81 


Transmembrane 


127 


143 


127 




INTEGRAL 


Likelihood = 


-2 


71 


Transmembrane 


228 


244 


227 


249 


INTEGRAL 


Likelihood = 


-2 


02 


Transmembrane 




63 


47 


63) 


INTEGRAL 


Likelihood = 


-1 


97 


Transmembrane 


75 


91 


73 


92) 


INTEGRAL 


Likelihood = 


-0 


85 


Transmembrane 


368 


384 


368 


384) 



• Final Results 

bacterial membrane — Certainty=( 
bacterial outside - 



.4163 (Affirmative) ■ 
.0000 (Not Clear) < i 
((Not Clear) < i 



bacterial cytoplasm — Certainty=0 . 0 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB14123 GB:Z99115 xanthine permease [Bacillus subtilis] 
Identities = 213/412 (51%) , Positives = 292/412 (70%) , Gaps = 5/412 (1%) 

Query: 14 LGLQHLLAMSAGSILVPIMIASaLGYNAKQLTyLIATDIFMCGIATLLGLRLSKHFeVGL 73 

La+QH+IiAMH«3+I+VP+++ A+G +QLTyL++ DIFMCG+ATLLQ+ ++ F6+6L 
Sbjct: 11 IX3IQHVLAMH«aiVVPLIVGKAMGLTVEQLTYLVSIDIFMCGVATIiLQVWSNRFFGIGL 70 

Query: 74 FVVLGCAFQSVAPLSIIGAQQGSGXMFGALIASGIYVVLVaGIPSIOTaNFFPPIVTGSVI 133 

PWL6C F +V+P+ IG++ G ++G++IASGI V+L++ F K+ +FFPP+VTGSV+ 
Sbjct: 71 PWLGCTFTAVSPMIAIGSEYGVSTVYGSIIASGILVILISFFFGKLVSFFPPVVTGSW 130 

Query: 134 TTIGLTLIPVaMGNMGD- - -NAKEPSLQSLTLSLVTIGWLLINIFAEGFLKSISILIGL 190 

T IG+TL+PVAM NM +A L +L Ii+ + +++L+ F RGF+KS+SILIG+ 

Sbjct: 131 TlIGITLMPVAMNNMaGGEGSADPGDLSNLALAFTVLSIIVLLYRFTKGFIKSVSILIGI 190 

Query: 191 ISGTILAAFMGLVDASVYADAPLVHIPKPFYFGAPRFEFTSILMMCIIATVSMVESTSVY 250 

+ GT +A FMG V V+DA +V + +PFyFGAP F 1+ M I+A VS+VESTGVY 
Sbjct: 191 LIGTFIAYFMGKVQFDNVSDAAVVQMIQPFYFGAPSEEiaAPIITMSIVaiVSLVESTGVY 250 

Query: 251 LRLSDITNDKLDSKRLRNGYRSEGLRVLLGGLFNTFPYTGFSQWVGLVQISGIRTRKPIY 310 

AL D+TN +L L GYR+EGLAVLLGG+FN FPST FSQWGLVQ++GI+ I 
Sbjct: 251 FALGDLTNRRLTEIDLSKGYRAEGLAVLLQGIFNAPFYTAFSQNVGLVQLTGIKKNAVIV 310 

Query: 311 FTALFLVILGLLPKFGAMAQMIPSPVLGGAMLVLFGMVALQGMKMLNQVDFEHNEHNFII 370 

T + L+ GL PK A +IPS VLGGAM+ H-FGMV G+KML+++DF E N +1 
Sbjct: 311 VTGVILMAFGLFPKIAAFTTIIPSAVLGGAMVAMFGMVIAYGIKMLSRIDFAKQE-NLLI 369 

Query: 371 AAVSIAAGVGFNGT-NLFISLPNTLQMFLTNGIVISTLTAWLNIILNGLPK 421 

A S+ G+G ++F LP+ L + THGIV + TAWI1NI+ N K 

Sbjct: 370 VACSVGIiGLG\7TWPDIFKQLPSALTLLTTI!lGIVAGSFTAVVIiNIVYWVFSK 421 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6609> which encodes the amino acid 
sequence <SEQ ID 6610>. Analysis of this protem sequence reveals the following: 



Possible site: 29 
Seems to have no N-terminal signal sequence 

Likelihood = -7.32 Transmembrane 
Likelihood = -6.37 Transmeitibirane 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



- Final Results 

bacterial membrane - 
bacterial cjutside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- Certainty=0.3930(A£firmative) . 
-- Certalnty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < i 



15 The protein has homology with the following sequences in the databases: 

>C3P:CAB15234 GB:Z99120 similar to purine permease [Bacillus subtilis] 
Identities = 216/421 (51%) , Positives = 302/421 (71%) , Gaps = 5/421 (1%) 

KQEHSHSQSAVLGLQEIVLSMYAGSILVPIMIflGZiLGYSARELTYLISTDIFMCGVaTFLQ 65 
K++H+ Q +LGLQH+L+MYAG+ILVP+++ A+G +A +LTYLI+ D+FMCG AT LQ 
KEQHNALQLMMLGLQHMLAMy&GAILVELIVGMIGLSaGQL^ 61 



FFPP+VTGSV+ +IG+SL+ 



Query: 


6 


Sbjot: 


2 


Query: 


66 


Sbjct: 


62 


Query: 


126 


Sbjct: 


122 






Sbjct: 


182 




243 


Sbjct: 






303 


Sbjct: 


302 




363 


Sbjct: 


362 



ILV IGL+PK A+ +IP+PVLGGAM+V+FGMV G++ML+ V 



SI +1 A S+S GLG 



+6IVI +LT++ L+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 328/416 (78%) , Positives = 380/416 (90%) 

Query: 7 SNSQRAIiGLQHLLaMYAGSILVPIMIASftliGYKaKQLTYLIATDIFMCGIATLIjQLRL^ 66 

S+SQ+A+LGLQH+L+MXAGSILVPIMIA ALC3Y+A++LTYLI+TDIFMCG+AT LQL+L+ 
Sbjct: 10 SHSQSAVIX3LQHVLSMYMSILVPIMIAfiALGYSAEELTYLISTDIFMCGVa.TFLQLKLT 69 

Query: 67 KHFeVGLBWLGCAFQSVAPLSlIGAQQGSGYMPGALIASGIYVVLVaGIFSKWANPFPP 126 

KH GVGLPWLGCAFQSVaPLSIIGaQQGSG MFGaLlAS6IYV+LVaOIFSK+A PFPP 
Sbjot: 70 KHTGVGLFWLGCAPQSVRPLSIIG&QQGSGRMFGALIASGlYVILVaGIFSKIflRFFPP 129 

Query: 127 IVTGSVXTTIGLTLIPVAiraiMGntlAKEPSLQSLTLSLVTIGVVIilNIFAKGFLKSISI 186 

IVTGSVIT IGL+L+ V2iMSNMGDN KEP+ QS+ LSL+TI ++IjL+ F KGF+KSISI 
Sbjct: 130 IVTGSVITVIGLSLVG\mMGNMGDNVKEPTAQSiy»SLLTIVIIIJjVQKI'TKGFVKSISl 189 

Query: 187 LIGLISGTILflAHMGLVDaSWaDAPLVHIPKPFYFGAPRFEFTSILMMCIIATVSMVES 246 

LIGL++GT+++A MGLVD + V +A +H+P PFYFG P FE TSI+MMCIIATVSMVES 
Sbjct: 190 LIGLVRGTLVSAlmGLVOTTFWEASWIHVPTPFYFGMPTFEITSIV^MCIIATVSMVES 249 

Query: 247 TQVYLRLSDITNDKLDSKKLKNGYRSEGLAVLLGGLFHTFPTOSFSQIWGLVQISGI^^ 306 
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Query: 307 KPIYFTMJFLVILGIlLPKFGaMRQMIPSPVIJGGA^^JVIJFGM\mLQGM^^ 366 

A LV++GLLPKF flMAQMIPSPVLGGaMLVLPGMWALQGM+MIJJ+VDF+ NE+ 
Sbjct: 310 RPIYYAaGILWlGLLPKFt»MRQMIPSPVLGGaMLVLFGMVaLQGMQMiaroVDFQK^ 369 

Query: 367 NFIIAAVSIMGVGFNGTNLFISLEimiQMFLTroiVISTLTAVVIM 422 

NFIIAAVSI+AG+GEHGTNLF SLP T QMFLTNGIVI+TLT+WEN++LNG K+ 
Sbjct: 370 NFILWiVSISAGICENQTNIlFASIIPETAQ^WLTNGIVIATLTSVVIraVIlIraKDI^ 425 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2140 

A DNA sequence (GBSx2256) was identified in S.agcdactiae <SEQ ID 6611> which encodes the amino 
acid sequence <SEQ ID 6612>. This protein is predicted to be xanthine phosphoribosyltransferase (xpt). 
Analysis of this protein sequence reveals the foUowmg: 

N- terminal signal sequence 

Final Results 

bacterial cytpplasm Certainty=0 . 1921 (Affirmative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:CAA13587 GB:AJ233894 xanthine phosphoribosyltransferase 



Query: 16 GEiraLJWDSFLTHQVDFELMQEIGKVFADKYfCEAGITKVVTIESiSGIAPAVYAAQA^ 75 

G+NILKVDSFLTHQVDF LM+BIGKVFA+K+ AGITKWTIEASGIAPA++ A+AL VP 
Sbjct: 1 GDNILK\roSFLTHQVDFSLMREIGKVFAEKFASAGITKVVTIEASGIAPALFTAEALNVP 60 

Query: 76 MIFAKKAKNITMTEGILTAEVYSFTKQVTSQVSIVSRFLSlSroDTVLIIDDFIANGQAAKG 135 

MIPAKKAKNITM EGILTAEVYSFTKQVTS VSI +FLS +D VLIIDDFLftNGQAAKG 
Sbjct: 61 MIFAKKAKNITMNEGILTAEVYSFTKQVTSTVSIAGKFLSPEDKVLIIDDFLfiNGQAAKG 120 

Query: 13 6 LLEIIGQAGAKVftGIGIVIEKSFQDGRDLLEKTGVPVTSLAR 177 

L++II QAGA V IGIVIEKSFQDGRDtLEK G PV SLAR 
Sbjct: 121 LIQIIEQAGATVEAIGIVIEKSFQDGRDLLEKRGYPVLSLAR 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6613> which encodes tiie amino acid 
sequence <SEQ ID 6614>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial oytqplasm Certainty=0. 2576 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 156/193 (80%) , Positives = 172/193 (88%) 

Query: 1 MKLnEERILMXSDVI^EKILKVDSPLTHQVDFEmQEIGKVEADKyKEAGITKWTIEAS 60 

M+IilfflERlL DG++LGENILKVD+FIiTHQVD+ LM+ IGKVFA KY EAGITKWTIEAS 
Sbjct: 1 MQlJ^RILTDGmifiKraLKVDWFLTHQVDWMKAIGKVFAQKYAEaGM 60 
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■Query,: 61 GIAPAVYAAQALGVPMIFAKECAKNITMTEGILTAEOTSFTKQVTSQVSIVSRFLSNDDTV 120 

GIAPAVYAA+A+ VPMIPAKK KNITMTEGILTAEVYSFTKQVTS VSl +FLS +D V 
Sbjct: 61 GIAPAVYAAEAMDVPMIFAKKHKNITMTEGIIiTAEVYSFTKQVTSTVSIAGKFLSKEDKV 120 

5 

Query: 121 LIIDDFLRNGQAAKGLLEIIGQRGRKWAGIGIVIEKSFQDGRDLI.EKTGVPVTSLARIKA 180 

LIIDDFIiRNGQAAKGL+EIIGQAGA+V G+GIVTEKSFQDGR L+E G+ VTSIiARIK 
Sbjct: 121 LIIDDPriHNGQAAKGLIEIIGQaGAQWGVGIVIEKSFQDGRRIiIEIDMGIEVTSIiARIKN 180 

10 Query: 181 FENGRVVFaEADA 193 

FENG + F EADA 
Sbjct: 181 FEMGNLNFLEADA 193 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
15 vaccines or diagnostics. 



Example 2141 

A DNA sequence (GBSx2257) was identified in S.agalactiae <SEQ ID 66]5> which encodes the amino 
acid sequence <SEQ ID 6616>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
20 >» Seems to have no N-termlnal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2546 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



VFDYEDIQLIPNKCIISSRSQADTSVKLGNYTFKLPVIPANMQTIIDEEVAETIiACEGYF 66 
VFDYEDIQLIP KCI++SRS+ DTSV+IjG +TFKLPV+PANMQTIIDE++A +LA GYF 
VFDYEDIQLlPAKCIVNSRSECDTSVEIKSGHTFKLPVVPAlJMQrilDEKIAISIAENayF 63 



Y+MHRF E R FIK M+ +GL +SISVGVKD EY+FV L E+ PE++TIDIAHGH 





7 


Sbjct: 


4 


Query: 


67 


Sbjct: 


64 




125 


Sbjct: 


124 




185 


Sbjct: 


184 




245 


Sbjct: 


244 




305 


Sb j ct : 


304 



GTGGWQIAALRWC+KRa KPIIADGGIRTHGDIftKSlRFGA+MUMIGSLFASH ESPG+ 



+E +G+ +KEY+GSASE+ KGE KMVEGKK+ + KG ++DTL EM+QDLQSSISYAGG 



i+++R+VDYVI VKNS I +NGD 



A related DNA sequence was identified in S. pyogenes <SEQ ID 6617> which encodes the amino acid 
sequence <SEQ ID 661 8>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 
Pinal Results 
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bacterial cytoplasm - 
bacterial metnbrane ■ 
bacterial outside - 



--• Certainty=0. 2405 (Affirmative) 

— Certainty=0. 0000 (Not Clear) < s 

— • Certainty=0 . 0000 (Not Clear) < : 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 297/327 (90%) , Positives = 311/327 (94%) 

Query: 1 METCIIPVFDYEDIQLIPireCIISSRSQaDTSVKLGNYTFKLPVIPJiNMQTIIDEEVAETL 60 

MFNDIPVFDYEDIQLIPNKCri+SRSQaDTSV LG Y FKLPVIPANMQTIIDB +AE h 
Sbjct: 8 MFNDIPVFDYEDIQLIPNKCIITSRSQMTSVTLGKYQFKLPVIPANMQTIIDETIAEQL 67 

(Juery: 61 ACEGYFYIMHRFNEEERKPFIKRMHDKGLIASISVGVKDYEYDFVTSLKEDAPEFITIDI 120 

A EGYFYIMHRF+E+ RKPFIKRMH++GLIASISVGVK EY+FVTSLKEDAPEFITIDI 
Sbjct: 68 AKEGYFYIiynjRFDEDSRKPFIKRMHBQGLIASISVGVKACEYBFVTSLKEIiaPK 127 

Query: 121 JUIGHSNSVIEIinQHIKQELPETFVIAGinraTPEAVRELElSiaGaDATKVSIGPGKVCITKV 180 

AHGH+NSVI+MI+HIK ELPETFVIftGmraTPEAVRELE2B«3anATKVGIGPGKVCITKV 
Sbjct: 128 AHGHANSVIDMIKHIKTELPETFVD«aSIVGTPEAVRELENAGADATKVGIGPGKVCIT^ 187 

Query; 181 KTGFGTGGWQLAALRWCSKaARKPIIMGGIRTHGDIAKSIRFGliSMVMIGSIiFAGHLES 240 
KTGFGTGGWQLAALRWC+KAARKPIIADGGIRTHGDIAKSIRFGASMVMIGSLFAGH ES 
3 KTGFGTGGWQLAALRWCAKAARKPIIADGGIRTHGDIAKSIRFGASMVMIGSLFAGHFES 247 



Sbj ct : 
Sb j ct ! 



241 PGKLVEVEGQQFKEYYGSASEYQKGEHKNVEGKKILLPVKGRLEDTLTEMQQDLQSSISY 300 

PGK VEV+G4- FKEYYGSASEYQKGEHKNVEGKKILLP KG L DTLTEMQQDLQSSISY 
243 PGKTVEVDGETFKEYYGSASEYQKGEHKirVEGKKILLPTKGHLSDTLTElMQQDLQSSISY 307 



AGGKELDSLRHVEYVIVKNSIWNGDSI 327 
AGGK+LDSLRHVDYVIVKHSIWNGDSI 
Sbjct: 308 AGGKDLDSLRHVDYVIVKNSIWDK3DSI 334 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 2142 

A DNA sequence (GBSx2258) was identified in S.agalactiae <SEQ DD 6619> which encodes tl 
acid sequence <SEQ ID 6620>. Analysis of this protein sequence reveals the foUowiag: 



Possible site: 57 

»> SeeTOS to have an uncleavable N-i 
INTEGRAL Likelihood =-16 
Likelihood = -8 
Likelihood = -8 
Likelihood - -5 
Likelihood = -5 
Likelihood = -4 
Likelihood = -4 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -2 
INTEGRAL Likelihood = -1 
INTEGRAL Likelihood = -1 



- Final Results 

bacterial « 
bacterial outside - 
bacterial cytoplasm - 



:erm signal seq 
Transmembrane 421 - 
Transmembrane 166 ■ 



Transmembrane 199 

Transmembrane 291 

Transmembrane 8 

Transmembrane 133 

Transmembrane 254 

Transmembrane 53 

Transmembrane 77 

Transmembrane 109 



437 ( 413 - 

182 ( 159 - 

236 ( 208 - 

338 ( 319 - 

215 ( 196 - 

359 ( 342 - 

307 ( 287 - 

24 ( 8 - 

149 ( 133 - 

270 ( 253 - 

69 ( 53 - 

93 ( 76 - 

125 ( 109 - 



• Certainty=0. 7793 (Affirmative) • 
Certainty=0 . 0000 (Not Clear) < i 
■ Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GEMPEPT database. 



5 = .13/447 (2%) 
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Query: 


11 


Sbjct: 


15 


Query: 


71 


Sbjct: 


75 


Query: 


131 


Sbjct: 


135 


Query: 


191 


Sbjct: 


195 


Query: 


240 


Sbjct: 


255 


Query: 


300 


Sbjct- 






360 


Sbjct: 


375 




420 


Sbjct: 


435 



A L F G+LIETSMNVTFP I,M++F ++ +QW+TT LL VA T+ ++AF+ K 



P++GPTyGGVI+ L W++IF 



+LL+A++ ++F L N + 



Q I L+L+FIiLEN QL+L 4 



i7 ++ A ++ IG 



+DGN++ NTLQQ+AG+ T4-VAS 4 



H ++ +K K 



There is also homology to SEQ ID 46. 

Based on this analysis, it was predicted that this protein and its epitopes, cxsald be useful antigens for 
vaccmes or diagnostics. 

Example 2143 

A DNA sequence (GBSx2259) was identified in S.agcdactiae <SEQ ID 6621> which encodes the amino 
acid sequence <SEQ ID 6622>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=:0. 2151 (affirmative) « suco 

bacterial membrane CertaintybO . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6595> which encodes the amino acid 
sequence <SEQ ID 6596>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have an uncleavable N-term signal seg 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
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Identities- = 74/214 (34%) , Positives = 112/214 (51%) , Gaps = 5/214 (2%) 

Query: 13 NESENNFFITLKTYFIWLFSIQIIT- - -DISTL]*:OTFDGSFAFHD1ETSIPHLVIDSNY 69 
N+ E F L +F++LF + I+T +1 + + F G F+FH+ + +P L ++ 
5 Sbjct: 15 NQLEETFIRELSHHFSHLFEVTILTSKANIQSNQLSTFQGIFSFHEHDIDLPTLYFKTSQ 74 





70 


LAISQraSKIESUroiKTFSELSKTMTEFHYMINFDLFNHLPYRFRLHinCDGQTIY 








++ + IjS+ +T F+ + +LP .+ RL + +G I HH 




Sbjct: 


75 


HGQGBliVTESVETXjaTAVLSLSQYLTGFYQKEDGHFLQYLPLQiUJLSrfflMSWIIVDNHAP 


134 


Query: 


130 


EDPFDIYPEEEYPIDKWVQNSLIEKKAKEIiHIiLPSaSQDYILVQSYKRLENDSGQLVGY 


189 






F P + 1+ W+ L LLPS S D+I +Q Y+ L+N GQLVG 




Sbjct: 


135 


NGSF--LPTTDKEIEDWIUffiLRLSDNPCKTELLPSGSLDHIYMQHYQaLKNPQGQLVGV 


192 




190 


lEHVHNIKPLLEGYLKESGQRIVGWSDVTSQASI 223 








++ V +IKPLL YL+E+GQRIVGWS0VTSG SI 




Sbjct: 


193 


LDWQDIKPLLNQYLEETGQRIVGWSDVTSGPSI 226 





Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
20 vaccines or diagnostics. 



Example 2144 

A DNA sequence (GBSx2260) was identified in S.agalactiae <SEQ ID 6623> which encodes tiie amino 
acid sequence <SEQ ID 6624>. Analysis of this protein sequence reveals the following: 



Possible site: 
















>>> Seems to have an uncleavable N 


term signal seq 










INTEGRAL 


Likelihood =-12 


10 


Tfansmeiribraiie 


431 






452) 


INTEGRAL 


Likelihood = -8 


92 


Transmembrane 


149 


165 


147 


174) 


INTEGRAL 


Likelihood = -8 


86 


Transmembrane 


404 




402 


428) 


INTEGRAL 


Likelihood = -7 


91 


Transmembrane 


299 


315 


293 


318) 


INTEGRAL 


Likelihood = -S 


42 


Transmembrane 


380 


396 


374 


398) 




Likelihood = -5 


31 


Transmenibrane 


350 


366 


347 




INTEGRAL 


Likelihood = -4 


57 


Transmembrane 


56 


72 


54 


- 74) 


INTEGRAL 


Likelihood = -3 


24 


Transmembrane 


172 


188 


171 


198) 


INTEGRAL 


Likelihood = -1 


33 


Transmembrane 


224 




224 


240) 


INTEGRAL 


Likelihood = -0 


59 


Trcuismenibrane 


101 


117 


101 


117) 



Final Results 

bacterial membrane Certainty=0 . 5840 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF84709 GB:AE004010 potassium uptake protein [Xylella 
fastidiosa] 

Identities = 201/570 (35%) , Positives = 319/570 (55%) , Gaps = 34/570 (5%) 



Query: 1 MAEMQtIVNHSSFDKASKafiFII--ALGIVYGDIGTSPLyTMQSLVENQGGISSVTESFIL 58 

M+ H + ++ G II A+G-1-V+GDIGTSPLYT++ G++ ++ +L 

Sbjct: 1 MSTSSHSGDCTAVPSNSNGTIILSAIGWFGDIGTSPLYTLKEAFSPNYGLTPNHDT-VL 59 

Query: 59 GSISLIIWTLTLITTIKYVLVALKADNHHEG3IFSLYTLTOKMTPW LIVPAVI 111 

G +SLI W + L+ TIKYV V ++ DN EGGI +L L ++ P4 + + + 

Sbjct: 60 GILSLIFWAMMLWTIKYVAVIMRVDKDGEGGIMALTALTQRTMPFGSRSIYIVGILGIF 119 

Query: 112 GGATLLSDGALTPAVTVTSAVEGLKWPSLQHIFQNQSNVIFATLFILLLLFAIQRFGTG 171 

G + DG +TPA++V SAVEGL+V F V+ TL +L+LLF QRPGT 

Sbjct: 120 GTSLFFGDGVITPAISVLSaVEGLEVAEPHMKAF WPITLRVLILLFLCQRFGTE 174 



Query: 172 VIGKLFGPIMFIWFAFLGISGLLHSPAHPEVFKAINPYYGLKLLFSPENHKGIFILGSIF 231 
60 +6K FGPI +WF +G+ G+ N pgv AINP +6L F +F+LG++ 

Sbjct: 175 RVGKTFGPITLLWFIAIGVVGVYNIAQAPEVLHAINPSWGLH-FFLEHGWHSMFVLGRVV 233 
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Query: 232 lATTGfiSMYSDLGHVGRGNIHVSWPFVKVAII -LSYCGQGAWILANKNAGNELNPFFAS 290 

LA TG EALY+D+GH G I +W +V + ++ L+Y C-QaA +L+N A NPF+ S 
Sbjct: 234 IAVTGGEALYADMGHFGRK&lRHA!#iyWLPMIALNYLGQ3ALVLSNPTAlG- -MPFYQS 291 

Query: 291 IPSQFTMHWILATIiAAlIASQAIiISGSFTLVSEAMELKIFPQFRSTYPGDN-IGQTYIP 349 

IP ++ LAT AA+IASQRr.I+GS++L S+AM+Ii P+ + + IGQ Y+P 

Sbjct: 292 IPDVraLYPMIArATAAAVIASQALITGSYSLSSQRMQLGYIPRMNVRHTSQSTIGQlYVP 351 

Query: 350 VINWPLFAITTSIVLLFKTSaHWIElUlYGI^ITITMLMTTILLSFFL-IQKGVKRGLVLLM 408 

+KW L + V+ F S M +AYG+A+T TM++TT+L+ + V R ++ +M 

Sbjct: 352 TVMmiiLTIiVILTVIGFGDSTSMASAYGVAWGTmiTTVLMIIYARANPRVPRLMLVWm 411 

Query: 409 MIFFGILEGIFFIASAVKIWHGGYVVVIIAVaiIFIMTIWYKBSKIVSRYVKL--LDLKD 466 

I F ++G FF A+ +KFM G + +++ V I M W +G K++ ++ ++11 + 
Sbjct: 412 AIVFIAVDGAFFYANIIKFMDGAWFPLUjGWIBTFMRTWLRGRKLLHEEMRKDGINLDN 471 

Query: 467 YIGQLDKLRHDHRYPIYHTNVVYLTNEMEEDMIDKSIMYSILDKRPKKAQVYWFVNIKVT 526 

++ L L + P V+LT + ++ ++M+++ + + F+ +K 

Sbjct: 472 FLPGL-MLAPPVKTO---GTAVFLT--ADSTWEEBUUfflNLKHNK^ 524 

Query: 527 DEPYTA EYKVDMMGTDFIVKWEIiYLGF 553 

PY A K++ + F +V + GF 

Sbjct: 525 KIPYAANSERLKIEPISNGF-YRVHIRFGF 553 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6625> which encodes the amino acid 
sequence <SEQ ID 6626>. Analysis of this protein sequence reveals the following: 



Possible site: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have an uncleavable N- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



:erm signal seg 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



343 - 353: 



Transmembrane 221 - 



- Certainty=0. 5713 (Affirmative) . 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 



>GP:AAF84709 GB:AE004010 potassium uptake protein [Xylella 
fastidiosa] 

Identities = 177/467 (37%) , Positives = 270/467 (56%) , Gaps 



= 20/467 (4%) 





7 


Sbjct: 


11 


Query: 


66 


Sbjct: 


70 


Query: 


119 


Sbjct: 


130 


Query: 


179 


Sbjct: 


185 




239 



TLITTIKYVLIALKaDNHHEGGIFSLFTLVRKMSPW LI IPAMIGGATLLSDGA 118 

L+ TIKYV + ++ DN EGGI +L L ++ P+ + I + 6 + DG 

^ttVVTIKYVAVIMRVDNDGEGGIMALTALTQRTMPFGSRSIYIVGILGIFGTSLFFGDGV 129 



+TPA++V SA+EGL+ 



+WF +GV G 4 



V+ TL +LI+IiF QRFGT +GK FGP+ 



E+ AIHP + LH F 



+F+LG++ LA TG EALY 



Query: 239 SDLGHVGRGNIYVSWPFVKM-CIVLSYCGQAAWILANKHSGIELKPFFASVPSQLRVYLV 297 



Sbjct: 


244 . 




298- 


3b jet: 


302 


Query: 


357 


Sbjct: 


362 




416 


Sbjct: 


422 


L alignment o: 
IcJentlties 


Query: 


10 


Sbjct: 


7 


Query: 


70 


Sbjct: 


67 


Query: 


130 


Sbjct: 


127 


Query: 


190 


Sbjct: 


187 




250 


Sbjct: 


247 




310 , 


Sbjct: 


307 




370 , 


Sbjct: 


367 




430 


Sbjct: 


427 


Query: 


490 


Sbjct: 


487 




550 


Sbjct: 


547 


Query: 


610 


Sbjct: 


G07 
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■+D+GH G' I +W +V + + L+Y GQ A +L+N + NPF+ S+P ++ 
^U)^raHPGaKA.IRHR^#mA?LPMLaIJm,GQGM,VLSNFm 301 

SLATJ^IIASQALISGSBTLVSEaWttOiKIFPLFRVTYPG-Ain^QLYIPVINWILFAVT 356 ' 
+LRT Aa.+iaSQRLI+GS++L S+AM+L P V + + +GQ4-y+P +NW L + 
ALATAAAVIASQaLITGSYSLSSQAMQISYXPRMNVRHTSQSTIGQIYVPTVN^ 361 

SCTVlaPRTSAHMEAAYGIAITITML^4TTILI.KYYLlKlraTRPILRHLVM^ PALVEFI 4 15 

TV+ F S M +AYG+A+T TM++TT+L+ Y PL +MA F V+ 

ILTVIGFGDSTSIffiSAYGVAVTGTImITTVLMIIYflE^aSIPRVPRIMLWM^ffiIVFIAVI^ 421 

FFLASAIKFMHGGYAWIlJU^IVFVMFIWHaGTRIVFKYVKSIiNIJI 462 
FF A+ IKFM G + ++L + 1 M W G +++ + ++ +N 



= 485/651 (74%) , Positives = 575/651 (87%) 



LITTIKYVL+ALKADNHHEGGIFSL+TLVRKM+PWLI+PA+IGGATLLSDGALTPAVTVT 



SA+EGLK VP L HI+QNQ+NVI TL IL++LF IQRFGTG IGK+FGP+MPIWF+FI-G 



+SG N+ H E+FKAINPYY L LLPSPENH+GIFILGSIPLATTGaEALYSDLGHVGR 



GNI+VSWPFVK+ I+LSYCGQ AWILM)JK++G EUIPFFAS+PSQ +++V LATLAAII 



ASQALISGSFTLVSEAMRLKIFP FR TYPG N- 

ASQALISGSFTLVSEAMRLKIFPLFRVTYPGANLGQLYIPVINWILFAVTSCTVLAFRTS 366 

AHMEAAYGIAITITMMTTILLSPFLIQKGVKRGLVLLMMIFFGILEGIFFLASAVKFM^ 429 
AHMEAAYGLAITITMLMTTILL ++LI+RG + L L+M FF ++E IFFLRSA+KFMH 
AHMEAAYGimiTITMLMTTILLKYYLlKKGTRPIIlJmLVMAFFAL 426 

GGYVWIIAVAIIFIMTIWYKGSKIVSRYVKLLDLKDYIGQLDKIiRHDHRYPIYHraVVY 489 
GGY WI+A+AI+F+M IW+ G++IV +YVK L+L DY Q+ +IiR D + +Y TNWY 
GGYAWIIJU^IVFVMPIWHaGTRIVFKYVKSL^^aroYKEQIKQLRDDVCro^ 486 

LTNR^ffiEmMIDKSIMySILDKRPKKAQVYWFVNIKV^DEPYTAEYKVDmGTDPIVKVEL 549 
L+NRM++ MID+SI+YSILDKRPK+AQVYWFVN++VTDEPYTA+YKVDMMGTD++V+V L 
LSNRMQDIMIDRSILYSILDKRPKRAQVYWFVNVQVTDEPYTAKYKVDMMGTDYMVRV^ 546 

YLGFKmQTVSRYLRTIVEELLESGRLPKQGKTYSVRPDSNVGDFRFIVLDERFSSSQNL 609 
YLGP+M QTV RYLRTIV++L+ESGRLPKQ + Y++ P +VGDFRF++++ER S+++ L 
YLGPRMPQTVPRYLRTIVQDIMESGRLPRQEQEYTITPGRDVGDPRFVLIEERVSHaRQL 606 

KPGERFV^ttiMKSSIKHWTATPIRWFGLQPSEVT^EVVPLIFTAKRGLPIKE 660 

ERF+M K+SIKH TA+P+RWFGLQtSEVT EWPLI + LPIKE 
SNFERFIMQTKASlKHVTASPmWFGLQYSEVTLEVVPLILSDVIiKLPIKE 657 

A related GBS gene <SEQ ID 8983> and protein <SEQ ID 8984> were also identifiied. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
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McG: Di scrim Score: 5.84 
GvH: Signal Score (-7.5): -.4.59 

Possible site: 18 
»> Seems to have an uncleavable N-terin signal seq 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

PERIPHERAL Likelihood = 

modified ALOM score: 2.9 



-12.10 threshold: 0.0 



2.10 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
20 



- 447 ( 423 - 

- 165 ( 147 - 

- 420 ( 402 - 

- 315 ( 293 - 

- 396 ( 374 - 

- 366 ( 347 - 

- 72 ( 54 - 

- 188 ( 171 - 

- 240 ( 224 - 

- 117 ( 101 - 



• Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5840 (Affirmative) ■ 

■ Certaintyi=0. 0000 (Not Clear) < i 

■ Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following 



in the databases: 



ORF02578(3e7 - 1630 of 2607) 

GP|9106998|gb|AAF84709.l|AE004010_6|AE004010(25 - 
{Xylella fastidiosa} 
%Match =17.8 

%Identity = 40.4 %Similarity = 63.7 

Matches = 177 Mismatches = 150 Conservative Sub.i 



463 of 634) potassium uptake protein 



210 



240 



270 



300 



TSTCLS*LK**RPGNALIISGLFIDKCCFENLICy]SIEPSHPED*YYLIGGLAEMQH\OTSSPDKASKAGFIIl^ 

\--\--h\\ 

MSTSSHSGDCTAVPSNSNGTIILSAIGWFGD 



IGTSPLYTMQSLVElJIQGGISSVTESFILGSISLIIWTLTLITTIKyVLVaLKADNHHEGGIFSLyTLVRKMTP W 

lllllllh: \:: - =11 =111 I = 1= lllll I == II llll H I - i 

IGTSPLyTLKEAFSPNYGLTPIffiDT-VLGILSLIFWMlMLW^IKYVAVI^4RVDNDGEGGIMALTALTQRTMPFGSRSIY 



LI-VPAVIGGATLLSIXW.TPAVTVTSAVEGLKVVPSLQHlFQNQSOTIFATIJfILLLLFAIORFGTGVIGKLFGPIMFI 
= : : : I : II =111 = = ! IIIIIM : == h II :|:|ll lllll =11 llll = = 

IVGILGIFGTSLFFGDGVITPAISVLSAVBX3LEV AEPHMKAFWPITLAVLILLFLCQRFGTEEVGKTFGPITLL 



879 909 939 969 999 1029 1059 1089 

WFAFLGISGLLNSFAHPEVFKAINPYYGLKLLFSPENHKGIFILGSIFLATTGAEALYSDLGHVGRGNIHVSWPFVKVAI 
II =|: |: I III: llll =11 === I =1=11== II II 1111=1=11 I I =1 =1 = = 

WFIAIGWGVYWIAQAPEVLHAINPSWGLHFFLEHGWHS-MFVIK3RWLAWGGEALYADMGHFGAKAIRHAWMYVVLPM 



1116 1146 1176 1206 1236 1265 1296 1326 

I-LSYCGQGAWIIANKNAGNELNPFFASIPSQFTMHWILATLAAIIASQALISGSFTLVSEAMRLKIFPQFRSTYPGDN 
: hi llll =1 = 1 I 111= III == III lhlllllll = l|: = l 1 = 11 = 1 1= =■ = 

LAUmK3QGALVLSNPTA--IGNPFYQSIPDWGLYPMIALATAAAVIASQALITGSYSLSSQAMQLGYIPRMir™ 
280 290 300 310 320 330 340 

1353 1383 1413 1443 1473 1500 1530 1560 

-IGQTYIPVIKWFLFAITTSIVLLFKTSAHMEAAYGIAITIIMIMrTILLSFFL-IQKGVKRGLVLLMMIFFGILEGIFF 
III 1=1 =11 1= = 1= I I I =111=1=1 ll==ll=l= = I I :: :| II =:| || 

TIGQIYVPTVNWTLLTLVILTVIGFGDSTSMftSAYGVAVTGTMMlTTVLMIIYARAOTRVPRLMLViM^ilVFIATO 
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1590 1620 1550 1680 1710 1740 1770 1800 

LftSAVKFMHGGYVWIIAVAIIFIWriWYEGSKIVSRYVKLLDLKDYIGQLDKIi^ 

|: :||| I : ::: I I | | :| == 
VftNIIKFMDGaWFPLI.LGWIPTFMRTWLRGRKLLHEE^ffllKDGIlE.DNFLPGLM^U>PVK^^ 



Based on this analysis, it was predicted that these proteins and their epitqpes coidd be useful antigens for 
vaccines or diagnostics. 

Example 2145 

A DNA sequence (GBSx2261) was identified in S.agalactiae <SEQ ID 6627> which encodes the amino 
acid sequence <SEQ ID 6628>. This protein is predicted to be serine dehydrogenase. Analysis of this 

protein sequence reveals the following: 

Possible site: 26 

»> Seems' to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm CertaintY=0. 3261 (Affirmative) < suco> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD07424 GB:AE000552 short chain alcohol dehydrogenase 
[Helicobacter pylori 26695] 
Identities = 18/31 (58%) , Positives = 25/31 (80%) 

Query: 3 WVASQPEHININRIEIMPVSQTYGPQPVYRD 33 

W+ QP H+NINRIEIMP4SQT+ P P +++ 
Sbjct: 219 WIYEQPLHVNINRIEIMPISQTFAPLPTHKN 249 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6629> which encodes the amino acid 
sequence <SEQ ID 6630>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1021 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 24/33 (72%), Positives = 29/33 (87%) 

Query: 1 MSWVASQPEHININRIEIMPVSQTYGPQPVYRD 33 

+SWV QP H+N+NRIE+MPVSQ+YGPQPV RD 
Sbjct: 20 VSWVIHQPPHVNVNRIELMPVSQSYGPQPVTRD 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2146 

A DNA sequence (GBSx2262) was identified m S.agalactiae <SEQ ID 663 1> which encodes the amino 
acid sequence <SEQ ID 6632>. Analysis of this protem sequence reveals the followmg: 
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Possible site: 21 

>» May be a lipoprotein 

Final Results 

5 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9337> which encodes amino acid sequence <SEQ ID 9338> 
10 was also identified. A fiirther related GBS nucleic acid sequence <SEQ ID 10781> which encodes amino 
acid sequence <SEQ ID 10782> was also identified. A further related GBS nucleic acid sequence <SEQ ID 
1095 1> which encodes amino acid sequence <SEQ ID 10952> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:Caa32349 GB:X14130 ORF (JVA 1 to 299) [Laotococcus lactis subsp. 
15 cremoris] 

Identities - 72/215 (33%) , Positives = 110/215 (50%) , Gaps = 8/215 (3%) 

Query: 4 RSKIjaaGFLTi:jflEnmTIJUiCSGKTSNC3TN--VVTMKBDTI 61 
, + K+ L + L S6 SN T+ V T G +T S FY ++K S + + 
20 Sbjct: 2 KKKmLKVLIJ^TATALLLLSGCQSNQTDQTVATYSGGKVTESSFYKEIiKQSPTTKIMI^ 61 

Query: 62 TLILSROTDTQYGDKVSDKKVSEAYNKTAKGYGNSFSSftLSQaGLTPEGyKQQIRTTMLV 121 

+++ R + YG VS K V-1-+AY+ + Y6 +F + LSQ G + +K+ +RT L 
Sbjct: 62 NMLIYRlVlisHAYGKSVSTKTVNn&YDSYKQQYGENFDAFLSQNGFSRSSFKESL^ 121 

25 

Query: 122 EYAVKEAAKKELTEaNYKE&YKNYTPETSVQVIKLDAEDKaKSVLKDVKADGaDFAKIAK 181 

E A+K+ K+++E+ K A+K Y P+ +VQ I ED AK V+ D+ A G DFA -HAK 
Sbjct: 122 EVALKiai--KIWSESQLKAAWiaYQPKVTVQHILTSDEDTAKQVISDIM-GKDFAMI^ 178 

30 Query: 182 E KTTATDKKVEYKFDSAGTTLPKEVMSAAPKL 213 

T D + F+ TL AA+JOi 
Sbjct: 179 TDSIDTATKDNGGKISFEUWKTbDATFKDAAYKL 213 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6633> which encodes the amino acid 
35 sequence <SEQ ID 6634>. Malysis of this protein sequence reveals the following: 

Possible site: 21 

>>> May be a lipoprotein 

40 Final Results 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty--G. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

>GP:AAA25247 GB:M83946 maturation protein [Lactobacillus paracasei] 
Identities = 88/294 (29%) , Positives = 146/294 (48%) , Gaps = 14/294 (4%) 

Query: 7 LIASVVTIASVMAIJACQSTKDNTKVISrraSDTISVSDFVMETKNTEVSQKaM^ 66 
50 L+AS T +++ L+ CQS + KV + G ++ S+FY E K + ++ + N++I R 

Sbjct: 10 LIiASTAT--ALIiLLSGCQSNQaDQKVATYSGGKVTESNFYKELKQSPTTraraiaNMLIYR 67 

Query: 57 WEAQYGDKVSKKEVEKAYHKTAEQYGASFSAALAQSSLTPETFKRQIRSSKLVEYAVKB 126 
YG VS K V AY +QYG +F A L+Q+ + +FK +R++ L E A+K+ 
55 Sbjct: 68 AIJmAYGKOTSTKTWSnmYDSYKQQYGENFDAFLSQNGFSRSSFKESLRTNFLSE\M 127 

Query: 127 AAKKELTTQEYKKAYESYTPTMAVEMITLDKESTiyCSVLEELKAEGADFTAIAKE KT 183 

K+++ + K +++Y P + V+ I +E+TAK V+ +L A G DF +AK T 
Sbjct: 128 L--KKVSESQLKAWKTYQPKVTVQHIXiTSDEDTAKQVISDL-AAGKDFATLAKTDSIDT 184 

60 

Query: 184 TTPEKKOTYKFDSGATOTPTDWKaASSMffiXffilSDVISVLDPTSYQKICFYIVK^ 243 
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T+ P+S+ AALG+ P + ++K+ 

.Sbjct: 185 ATKDNGGKISFESNNKTLDATFKDARYKLKNGDYTQT PVKTOIC3yEVIKMINH-P 238 

Query: 244 KKSDWQEYKKRLK&IIIAEKSKDliOTQNKVIANRLDKJan^IKDKRFANILA^ 297 
5 K + KK L A + A+ S+D + +VI+ L +V IKDK A+ L Y 

Sbjct: 239 AKXSTPTSSKKALTASVYAKWSRDSSIMQRVISQVLKNQHVTIKDKbiaDAIiDSY 292 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 125/213 (58%) , Positives = 168/213 (78%) , Gaps = 1/213 (0%) 

10 

Query: 1 MKTRSKLAAGFLTLMSVATLAACSGKTSNGTNWTMKGDTITVSDFYDQVKTSKAAQQSM 60 

MK +KL A +TL SV LAAC T++ T V++MKGDTI+VSDFY++ K ++ +Q++M 
Sbjct: 1 MKNSimilASVVTIMVMAUiACQS-THDNTKVISMKGDTISVSDFYIffiTKlTrEV^ S9 

15 Query: 61 LTLILSRVFDTQYGDKVSDKKVSEAYNKTAKGYGNSFSSALSQaGLTPEGYKQQIRTTML 120 

L L++SRVF+ QYGDKVS K+V +AY+KTA+ YG SFS+AL+Q+ LTPE +K+QIR++ L 
Sbjct: 60 liNLVISRVFEAQYGDKVSKKEVEKAYHKTAEQYGaSFSAAIAQSSLTPETFKRQIRSSKL 119 

Query: 121 VEYAVKEAAKKELTEAISYKEAYKim'PETSVQVlKLnaEDKAKSVLKDVHffi 180 
20 VEYAVKEAftKKELT YK+AY++YTP +V++I LD E+ AKSVL+++KA+GADP lA 

' Sbjct: 120 VEYAVKEAAKKELTTQEYKKAYESYTPTMAVEMITLimETAKSVLEELKftEGaDPIAIA 179 

Query: 181 KEKTTATDKKVEYKFDSAGTTLPKEVMSAAPKL 213 
KEKrr +KKV YKPDS T +P +V+ AA Ii 
25 Sbjct: 180 KEKTTTPEKKVTYKFDSGRTNVPTDWKAASSt. 212 

SEQ ID 10782 (GBS657) was expressed in E.coU as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 143 (lane 8-10; MW 62.8kDa) and m Figure 187 (lane 3; MW 63kDa). 
Purified GBS657-GST is shown in Figure 245, lanes 2 & 3. 

30 Based on this analysis, it was predicted fliat these proteins and their epitopes could be useful antigens for 



Example 2147 

A DNA sequence (GBSx2263) was identified in S.agalactiae <SEQ ID 6635> which encodes the amino 
acid sequence <SEQ ID 6636>. This protein is predicted to be methyltransferase. Analysis of this protein 
35 sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-tertninal signal sequence 

Final Results 

40 bacterial cytoplasm — Certainty^o . 2576 (Affirmative) < suco 

bacterial membrane — Certainty^O . 0000 (Not Clear) < suco 

bacterial outside --- Certainty-0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences m the GENPEPT database. 

45 >GP:CaA68045 GB:X99710 methyltransferase [Lactocoocus lactis] 

Identities = 132/227 (58%) , Positives = 169/227 (74%) 

Query: 1 WQSySKNaNHHMERPVTraEIVQYmQHQKQNKBCLAELEABSU^ 60 
MV++Y +N M RPWK E+V++MR Q Q G LaE+ ERK+ NIP+IPHET YF 
50 Sbjct: 1 MVETYKSTSNPMMNRPWKAELVEWMRSSQTQVTGEIJffiVLNE^^ 60 



Query: 61 RFLMQTLQPKHILEIGTAIGFSALJ^lAENAPEaKlTTIDRNEEMIAIAKENFAKYDNHNQ 120 

+ I.+ L+PK ILEIGTAIGFSAL+MA+ PEA+I TIDRN EMI LAK+N AKYD+ NQ 
Sbjct: 61 QMLLSLLKPKRILEIGTAIGFSALVMAQEVPEaEIVTIDRNPEMIEIMKNIJ^ 120 

Query: 121 ITIiiEGDAVDVLQTIBKSYDFVFraDSAKSKYIOTLPQVLKHIiDVGGVVVIB 180 

I L EGDA DVLQ h +D VFMDSAKSKY+ FLP+ L+ L G++++DD+FQ G+I 
Sbjct: 121 IQLKEGiaADVLQEI.Ki3PPDLVFMDSAKSKYVEFLPKSLEr.t,SENGLILMDDVFQAGEIL 180 
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Query: 181 KPIDEVRRGQRTIYRGLQRLFDSTLQHPDLTATLVPLGDGLLMIRKN 227 

PI EV+R QR + RGL++LFD +P +++PLGDGLLMI+K+ 
Sbjct: 181 LPIMEVKRNQRALERGLRKLFDEVFDNPKYMTSVLPLGDGLLMIKKH 227 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6637> which encodes the amino acid 
sequence <SEQ ED 6638>. Analysis of this protein sequence reveals the following: 



Possible site: 46 
» Seems to have no N-terminal signal sequence 
INTEGRMi Likelihood = -1.38 Transmembrane 153 - 169 ( 152 - 



Final Results 

bacterial membrane — Certaintys=0. 1553 (affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty-0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



(Juery: 1 MVKSYSKTANHNMRRPWKEELVHYMRTRQRQfTTGFLAELEQFARQENIPIIQPEWAYF 60 

MV++Y T+N M RPWK ELV +MR+ Q Q TO LaE+ FA++ NIP+I B V YF 
Sbjct: 1 MVEraKSTSNPMMNRPVVTCAELVEWMRSSQTQVTGEIAEVIOTAK^ 60 

CJuery: 61 RFLLQSLQPKHILEIGTAIGFSALLMAENAPDATIVTIDRNREMIDFAKANFAKYDSRQQ 120 

+ LL L+PK ILEIGTAIGFSAL+MA+ P+A IVTIDRN EMI+ AIC N AICYD R Q 
Sbjct: 61 QMLLSLLKPKRILEIGTAIGFSALVmQEVPEAEIVTIDRNPEMIEmKKNLAKYDHRNQ 120 

Query: 121 IRLLEGDAADILSTLEGNFDFVFMDSAKSKYIVPLPEILRLLKVGGWILDDVFQGGDIT 180 

I+L EGDAAD+L Ii+G FD VFMDSAKSKY+ FLP+ L LL G++++DDVFQ G+I 
Sbjct: 121 IQLKEGnaADVLQELKBPFDLVFMDSiUCSKYVEFLPKSLELLSENGLILMDDVFQftGEIL 180 

Query: 181 KPIEDIRRGQRTIYRGLQSLFDATLTHENLTTSLVPLSDGLLMIRKN 227 

PI +++R QR + RGL+ LFD +P TS++PL DGLLMI+K+ 
Sbjct: 181 LPIMEVKRNQRALERGLRKLFDEVFDNPKYMTSVLPLGDGLLMIKKH 227 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 177/235 (75%) , Positives = 199/235 (84%) 

Query: 1 MVQSYSKNaNHNMRRPVVKEEIVQYmQHQKQ^raGCLaELEaFAKQENIPIIPHETaTYF 60 

MV+SYSK AMHNMRRPWKEE+V YMR QKQ G LAELE FA+QKNIPII E YF 
Sbjct: 1 MVKSYSKTANHNMRRPVVKKELVHyMRTRQKQTTGFIAELEQPARQ^^ 60 

Query: 61 RFr*«3TLQPKHILEIGTAIGFSALLMAENAPEaKITTIDRNEEMIALAKENFAKYnNHNQ 120 

RFrr+Q+LQPKHILEIC3TAIGFSALLMAE2!iaP+A I TIDRN EMI AK NFAKYD+ Q 
Sbjct: 61 RPLIfiSLQPiaiII^:iGTAIGFSALLMAENBPimTIVTIDRNREMIDFAKaNFAKYDSRQQ 120 

Query: 121 ITLLEGDAVDVL(?rLDKSYDFVFmsaKSKYIVFLPQVLKHI£IVGGVVVIJ^ ISO 

I LLEGDR D+L TL+ ++DFVFMDSAKSKYIVFLP++L+ L VGGVV+LDD+FQGGDI 
Sbjct: 121 IRLLEGDAADILSTLEGNFDFVFMDSAKSKYIVFLPEILRLLKVGGWILDDVFQCaGDIT 180 

Query: 181 KPIDEVRRGQRTIYRGLQRLFDSTLQHPDLTATLVPLGDGLLMIRKNADHIVLED 235 

KPI+++RRGQRTIYRGLQ LFD+TL HP+LT +LVPL DGLLMIRKN IVL D 
Sbjct: 181 KPIEDIRRGQRTIYRGLQSLFEATLTHENLTTSLVPLSDGLLMIRKNQADIVLPD 235 



Based on this analysis^ it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example2148 

A DNA sequence (GBSx2264) was identified in S.agalactiae <SEQ ID 6639> which encodes the amino 

acid sequence <SEQ ID 6640>. This protein is predicted to be phosphoglycolate phosphatase. Analysis of 

this protein sequence reveals the following: 

5 Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2193 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) s suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8985> which encodes amino acid sequence <SEQ ID 8986> 
was also identified. This protein appears to be a hydrolase i.e. an exposed protein. 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CftA91552 GB:Z67740 unidentified [Streptococcus pneumoniae] 
Identities = 39/117 (33%) , Positives = 67/117 (56%) , Gaps = 9/117 (7%) 

Query: 98 KEQESRDSKIHLM-PYAKEILEWrKEQDIPNFMYTHKG;^THSVLETLQISHYFDEILTG 15S 
20 KE E+R+ + ++ ++LE Q +F+ +H+ +I,E l4- YF E++T 

Sbjot: 25 KENEARELEHPILFEGVSDLLEDILN!3GGRHFLVSHRNDQVLEILEKTSIAA.yFTEVVTS 84 

(Juery: 157 VSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNaGIKS MLR 207 

SGF+RKP+P+ + YL ++Y + + IGDRP+D+E Q JM3+ + +NLR 
25 Sbjot: 85 SSGFKRKENPESMLYLREKYQISSGLV— IGDRPIDIEAGQAftGLDTHIiFTSIVMLR 139 

SEQ ID 8986 (GBS240) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 2; MW 26kDa). It was also expressed in Kcoli as a GST-fiision 
product. SDS-PAGE analysis of total cell extract is shown in Figure 61 (lane 3; MW 51.5kDa). 
30 GBS240-GST was purified as shown in Figure 225, lane 12. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2149 

A DNA sequence (GBSx2265) was identified in S.agalactiae <SEQ ID 6641> which encodes the amino 
35 acid sequence <SEQ ID 6642>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no W-terminal signal sequence 

Final Results 

40 bacterial cytoplasm --- Certainty=0 . 2620 (Affirmative) < suco 

bacterial membrane CertaintY=0 . 0000 (Not Clear) < suco 

bacterial outside --- CertaintY=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6643> which encodes the amino acid 
45 sequence <SEQ ID 6644>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm -— Certainty=0. 2967 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 463/599 (77%) , Positives" = 541/599 (90%) 

Query: 1 MSDNRSHIEEKyQWDLTTVFATDELWETEVVELTQAIDNAKGFSCaiLLDSSQSIiLEITEV 60 

M+DNRSH+BEKy WDI1+T+FATD+ WE EV +L ++ +KGF+GHLLDSS +LIH-+T+ 
Sbjct: 1 ^WDNRSHLEEKrITOLSTIFATDKDWBAEVSDIATBVEaSKBFAGHLLDSSflNljrlKVTKT 60 

Query:, 61 ELDLSRRLEKVnHTYASMKNDQDTTOUWQEFQiUCATALYiUCFSETFSFYEPELL^ 120 

L+L+RR+EKVYVYA MKtqDQDTTVAKYQE+CSMCA+ LYAKFSE FSFY+PE-H+ L + D 
Sbjct: 61 YLEIiARRVEK?SryVYJUmKimQDTTVAKYQEYQaKftSGL^^ 120 

Qaery: 121 YQSFIjIiEMPDLQKYDHFFEKIFflNKPim.SQNEEELIAGASEIFGARGETFEIIDN2mi^ 180 

YQ+FL E P+L+ Y+HFF+K+F + HVLSQ EEELLAGA EIP A ETF ILONAD+V 
Sbjct: 121 YQAFLTETPELKVYNHFFDKLFQ2iREim.SQREEELIAGSiQEIPII«3AEETFSIU^^ 180 

Query: 181 FPVVBa!IAKGEEVELTHGNFISLMESSDRTVRKEAYQAMYSTYEQFQHTYAKTrQT^^^ 240 

FPWKN KGE+VELTHGMPISUffiS DR+VR+ Ay+AMYSTYEQPQHTYAKTLQTNVK Q 
Sbjct: 181 FPVVKOT)KGEDVELTHGNFISIJffiSia3RSTOQAAYEaMYSTYEQFQHTYAIOTLQTimTO 240 

Query: 241 NFKMVHHYQSMQSALSANFIPEEVYETLIKTVNHHLPLLHRYMKLRQKVLGLDDLKMY 3 00 

N+KARVH Y SARQ+A++ANFIPE VY+TL++TVN HLPLLHRY+KLRQ+VLGLDDLKMY 
Sbjct: 241 NYKARVHKYDSARQAAMAftNFIPEAVYDTLLETVNKHLPLLHRYLKLRQEVLGLDDLKMY 300 

Query: 3Q1 DVYTPLSQMDMSFTYDEALKKSEEVIAIFGEAYSERVHRAFTERWIDVHVNKGKRSGAYS 360 

DVYTPLS+ D++ YDEMi+K+E+VLA+FG+ Y++RVHRAFTERWIDVHVNKGKRSGAYS 
Sbjct: 301 DVYTPLSETDIAIGYDEALEKAEKOTAWGKDYADRVHRAFTERWIDVHVNKGKRSGAY^ 360 

Query: 361 GGSYDTNAFMLIJmGDTIjm>YTLVHETGHSLHSTFTRENQPYWGDYSIFLAEIflSTTO 420 

GGSYDTNRF+LLNWQDTXCINLYTLVHETGHSLHSTFTRE QPYVYGDYSIFLAEIASTTN 
Sbjct: 361 GGSYimiAFlLi™QDTi™LYTLVHETGHSLHSTFTRETQPYWGDYSIFLaEIASTTN 420 



ENI+TE LL EV+D4-K RFAlIiNHYIiDGF+GT+FRQTQFAEFEHAlH ADQ+G+VLTSEY 
Sbjct: 421 ENIMTEALIibffiVQDEKERFAIIJfflYlBGFRGTVFRQTQFAEFEHAIHQADQKjGEVLTSEY 480 

Query: 481 lOTOLYAEIOTKYYGLTKEDNHFIQYEWARIPHFYYNYYVFQYATGFAAANYLAERIVNGN 540 

m LYA+IJffiKyYGL+K+DNHFIQYEWARIPHFYYNYYV+QYATGFAAA+YIiA++IV+G 
Sbjct: 481 LNQLYADUSEKYYGLSKraNHFIQYEWJU^IPHFYYNYYVYQYATGFAAASYLADKIVHGT 540 

Query: 541 PEDKKAYLKYLKAGNSDYPLKVIAKAGVDMTSADYLDAAFRVFEERLVELENLVAKGVH 599 

+D + YL YLK+GNSDYPL VIAKAGVDM DYL+AAF+VF+ERL ELE LV+KG+H 
Sbjct: 541 QDDIDHYLAYLKSGNSDYPLEVIAKAGVDMEKGDYLEAAFKVFDERLTELEVLVSKGIH 599 

Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2150 

A DNA sequence (GBSx2266) was identified in S.agalactiae <SEQ ID 6645> which encodes the amino 
acid sequence <SEQ ID 6646>. This protein is predicted to be competence protein. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal 



55 Final Results 

bacterial cytoplasm Certainty=0 .2955 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AaC2374S GB:AF052209 competence protein [Streptcx!OCCUS pneumoniae] 
Identities = 127/269 (47%) , Positives = 176/269 (65%) , Gaps = 8/269 (2%) 

Query: 1 MLIAKDKQOlSn^IOTjLESHPGKGQYFCPTCCSRVlUjKWSRIMRRHF 60 
M +A+D +G L4-N+LE K Y CP C + L+ G +R HFAH SLK+C F+ KNE 



Query: SI SNEHI<3I,KmiYMSLSREl!ffiTN!LEHHLPEINQlADI.PUNETLAi:jE VQCSRLSEQRL 116 

S EHL K LY L +E + LE+ Ii B+ QIAD+PVN lALE V C + + L 
Sbjct: 51 SPEHIiANKESLYHWLKKBTKVQLEYPIjSBLKQI3®VFVNGinALESSVWPCL^ KVL 117 

Query: 117 RERTKAYLQADFQWWLLGEKLWLKHRLTNLHKQFLQFSQSIGFHIWELDLRLEVLRLKY 176 

4ER++ Y +QV WLHiG+KLWIiK RLT L FL FSQ++GF++WELD +VLRLKy 
Sbjct: 118 KERSEGYRSQGYQVLWLLGQKLWLKERLTRLQAGFLYFSQNMGFYVWEIiDKGKQVLRLKY 177 

Query: 177 LIYEDI,RGHVYYLSKTCPL-SGDVIiAFLKWPYQSKNIUFyKVKQDRNIRDY\7RQQLRYGN 235 

LIY+DLRG 4+Y K G +L L+ PY+ + ++ + V +D++I Y+RQQL Y N 

Sbjct: 178 LIYQDLEGKLHYQIKEFSYGQGSLIiEILRLPYKKQKISHFTVSEDKDICRYIRQQLYYC2N 237 

Query: 236 QPWLRK!QEKAYI.SGQil!lIJ:.TQEIjMMFFPQI 264 

FW+++Q +AY G+N+LT L ++PQI 
Sbjct: 238 LFWMKEQAEAYQKBENILTYGLKEWYPQI 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6647> which encodes the amino acid 
sequence <SEQ ID 6648>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal seq-jence 

Final Results 

bacterial cytoplasm Certainty=0 . 1034 (Affirmative) < suoo 

bacterial membrane -— Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 154/312 (49%), Positives = 204/312 (65%), Gaps = 1/312 (0%) 

Query: 1 MLIAKDKQGNLINLL-ESHPGKGQYFCPTCCSAVRLKRGRIMRRHFAHISLKNCQFYHEN 59 

+L A D + LI+L+ + K + CP C S VRI.+ G I R HFAH+ L +CQF EN 
Sbjct: 4 ILTALDDKNQLISLVTQPISTKPPFRCPACKSPVRLRQGTIRRPHFAHVQLAHCQFQAEN S3 

Query; 60 ESNEHI^JLKAKLYMSLSREHETMLEHHLPEIHQIflDLFVNETLALEVQCSRLSEQRLRER 119 

ES EHL LKAKLY SL R +E +-LPE+ QIflDL+VN+ LALE+QCS L +RL++R 

Sbjct: 64 ESEEHLTLKAKIiYTSLTOTEAVCIEKYLPELQQIJmLWVMJKLALEIQCSPLPVERLKKR 123 

Query: 120 TKAYLQaDFQVRmrX3EKLWLKHRI.TtJLHKQFLQFSQSIGFHIWEII)UiLEVIJiLKYM 179 

TKKY + + VRMLLG KLWL LT L KQFL FS S+GFH+MELD +LRLKYLI+ 
Sbjct: 124 TKAYQEKWPWWLLGRKIiHIitJTHLTAIflKQFLYFSSSLGFHLWEIjnAAaK^^ 183 

Query: 180 EDIJlGHVYYLSKTCPIjSGDVLAFLKMFyQSKNIiNFYKVKQDRN^ 239 

EDL 6 V YIi+KT L +++ + FYQ + L Y+ K N+ +++ L + WL 
Sbjct: 184 EDLFGKVSYLTKTISLDHNI^IEMFRLPyQQEILYSYQKK^mmLSmQRALLARHPKWL 243 

Query: 240 RKQEKAYLSGQl!^LLTQELMMFFPQIQPPRVDTDFCQITNSM'SFYQNPTImQKNK^^ 299 

R+QEKAYLSG HLL F+PQ + + FCQI +L +Y++F YY+K K+ 

Sbjct: 244 RRQEKAYLSGmLIMLTTIffiFYPQWRPVQSSSGFCQIKSNLRPYYESFKVYY 303 

Query: 300 QTLYPPVFYDKI 311 

QTI1+ p +"y K+ 

Sbjct: 304 OTLFSPKYYVKM 315 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2151 

A DNA sequence (GBSx2267) was identified in S.agalaciiae <SEQ ID 6649> which encodes the amino 
acid sequence <SEQ ID 6650>. This protein is predicted to be bicyclomycin resistance protein. Analysis of 
this protein sequence reveals flie following: 



i cleavable N-term signal seq. 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 

Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



= -1, 



TransTnembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



259 - 285 ( 267 - 281] 

290 - 306 { 287 - 314; 

203 - 219 ( 199 - 225; 

157 - 173 ( 143 - 184; 

53 - 69 ( 44 - 12 

352 - 378 ( 357 - 38] 

242 - 258 ( 240 - 26] 



107 - 123 ( 106 - 123; 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 4333 (Affirmative) < suco 

- Certainty!=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA15047 GB:AJ235272 BICYCLOMYCIN RESISTANCE PROTEIN (bcrl) 
[Rickettsia prowazekii] 
Identities = 86/336 (25%) , Positives = 159/336 (46%) , Gaps = 28/336 (8%) 





Query: 


73 


30 




70 






133 


35 


Sbjct: 


130 






191 




Sbjct: 




40 


Sbjct: 


243 


45 




298 




Sbjct: 


310 






350 


50 


Sbjct: 


367 



G++ VLLGL + ++S IS F+ N 



S+ D Y+ E 



HTLKASTTFDT KAALLMLITFLVGI AYIGATVKI PTLLVTKYHYATSFSSNM 242 

++S F+ K +L L P++G Y G ++ P +L+ + SF + 

AFSQSSKYFEVFNIIIKDKMLWLYAFIIGAENGIYYGFFIEAPFILIDQMRVLPSFYGKL 24 9 



f G Iri-K V+ +K + 



TAG+I G 1 +1 +T 



L + +V F Y+ L+ K K 



A related GBS gene <SEQ ID 8987> and protein <SEQ ID 8988> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 7 

McG: Diecrim Score: 6.28 

GvH: Signal Score (-7.5): -2.45 
Possible site: 25 

»> Seems to have a cleavable N-term signal, seq. 

ALOM program count: 10 value: -8.33 threshold: 
INTEGRAL Likelihood = -8.33 Transmembrane 
INTEGRAL Likelihood = -8.33 Transmembrane 
INTEGRAL Likelihood = -7.38 Transmembrane 



78 - 94 ( 75 - 
269 - 285 ( 267 - 
290 - 306 ( 287 - 
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Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = -6, 
Likelihood = -3, 
Likelihood = -3, 
Likelihood = -1, 
PERIPHERAL Likelihood >- 3, 
idified ALOM score: 2.17 



INTEGRAL 
INTEGRAL 
INTEGRA 
INTEGRAL 
INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Step: 3 

■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- 219 ( 199 - 

- 173 ( 143 - 

- 69 ( 44 - 73; 

- 378 ( 357 - 381] 

- 258 ( 240 - 

- 345 ( 328 - 

- 123 ( 106 - 123! 



•- Certainty=0. 4333 (Affirmative) • 
■- Certainty=0. 0000 (Not Clear) < i 
- Certaintyi=0. 0000 (Not Clear) < 1 



The protein has homology with the following sequences in the databases: 

ORF01955(517 - 1449 of 1749) 

EGAD|163303|RP603(70 - 402 of 407) bioyoloraycin resistance protein {Rickettsia prowazekii} 
OMNI|nT01RP0625 conserved hypothetical protein GP| 3861147 |emb|CAR15047 . 1 | |AJ235272 
BICYCLOMYCIN RESISTANCE PROTEIN (bcrl) {Rickettsia prowazekii} PIR|E71665|E71665 
bicyclomycin resistance protein (bcrl) RP603 - Rickettsia prowazekii 

%MatGh =5.9 

%Identity =26.5 %SimiIarity =52.0 

Matches = 85 Mismatches = 141 Conser\rative Sub.s = 82 



474 504 534 564 594 624 654 684 

SLVTIPAMMITlFVILSNFVVTKLGKKNTVLLGLCLILMSGPISFFTSNFSIAMftSRLLLGIGIQLYNSLSISIITDLYE 
30 |:-: mil : ::| || |: ] : | :|:: |: : : : |: 1 |: 

MTSTLYFLGFAVGILSLGRLSDIYGRRPIVLLGLPIYIVSSIISIFSFNIEMLMIARFIQAFGVSVGSVIGQSMARDSYQ 
60 70 80 90 100 IID 120 

714 744 774 .801 831 858 888 

3 5 ADERRSMIGLRT2iSLNIGKALTTPIVGLVLA- IGVNYI YLVYLLVI PVFP - FFWKNVPEVEN(3THTLKAST TFDT — 

I : : : : I |1 ::] | :: : :|::: : | ::: :::: :|1 ::| |: 

GAELSYWAILSPWLLFIPALGSYIGGYIIEYLSWHYVFIFFSLAGTILLALYYQILPETKYYIAFSCSSSKYFEVEKIIII 
140 150 160 170 180 190 200 

40 933 954 984 1014 1044 1074 1095 1125 

KARLLmiTFLVG---IAYIGATVKIPTLLWKmyATSFSSNMLTLLAPSGILVGSVFGKLVK---VFQEKTLI,IMILA 
I :| I h:| I I :: I :|: : || : |h|: ]: | : | |:| |: H = I : 

KDKMLWTLYAFIIGAFNGIYYGFFIEAPFILIDQMRVLPSFYGKLAFLLSPASIFGGFLGGYLIKKRQVYDKKVMSIGFIF 
220 230 240 250 260 270 280 

45 

1155 1182 1209 1224 1254 1284 1311 1335 

MGIGNVLFALRNNQI IFI - VAS IL - IGASFVGTM SSVFFY ISKNYAKEHNKFITSLALTAGNI -GVILTPLI - - L 

I =111: : = II |:: = = I 1= I 1 = = h III =11 hi I I =1 = 

SLCGCILFAVDSPILEFILVSNVFAIAMIFMPMMIHMIGHSLLIAITLRYALEDYATVTGTA- - -GSIFGAIYYWIASV 
50 300 310 320 330 340 350 360 

13S5 1395 1419 1449 1479 1509 1539 1569 

TKLPSQLHLEPFMTPFLITSGLIWINV--FWLVLMSKNK*KVIRKmFPRIVKUGEKMLIAKDKQGa!ILINLLESHPGKG 
I l=:| I |: I : =1 I |: |: I I 

55 TYCVSKIHGETISNFSLLCLVLSISSVISFYYICLLYKKKSIIIN 
380 390 400 

There is also homology to SEQ ID 400 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 2152 

A DNA sequence (GBSx2268) was identified in S.agalactiae <SEQ ID 665 1> wbicli encodes the amino 
acid sequence <SEQ ID 6652>. This protein is predicted to be 16S pseudouridylate synthase (rsuA). 
Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm CertaintY=0. 2645 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:B3VB06992 GB:AP001518 16S pseudouridylate synthase [Bacillus halodurans] 
Identities = 106/234 (45%), Positives = 141/234 (59%), Gaps = 1/234 (0%) 

Query: 1 MRLDKLLGQAGFGSRNQVKKLICSRQVSVDGQIVTKDNVIVDSGLQSIFVGKERVCLKES 60 
MR+DK I) GFGSR VKKL+ + VVGQ+ +V+ +SIVEVK 
' Sbjct: 1 MRIDKFLaHMGFGSRKDVKKLLKTGAVRVQGQPIKDPSTHVEPESESITVYGEEVEYKPY 60 

Query: 61 iSYYLLYKPSGWSaVRDSEHKWIDLISEKDKVBGLYPIGRLDRDTEGLLrVT^^lraPLGY 120 

Y ++ KP SV+ A D EH+TVTDL+ E+++ P+GRLD+DT GLI1++TN+G + 

Sbjct: 61 VYII^^^raKPKBVICRTEDLEHETVIDLLGKEERHYEPSPVGRBDKDT^7GLI^ITIro 120 

Query: 121 EMI^PKHHVaKTYYVEVNGFLERnAITFFEEGVVFDDaTKCKPAELTIDTaNNDKSTARI 180 

++ PKHHV KTY V G + + + F GW DDG KPA L I A +S + 
Sbjct: 121 VraSPKHHVPKTYRfiLVEGHVTEEDVGAFSHGVVLDIXSYVTKPATLHILEA-GSUlSHIEL 179 

Query: 181 TITEGKFHQVKKMFIAYGVKVIYIiRRlSFGDLRIiDMNLKPGQYRRLR^ 234 

+TEGKFHQ7K+MF A G +V+ L HI G+L LD L G+YR L E A+L 
Sbjct: 180 ILTEGKFHQVKRMFQAVGKRVLEnERIKIGNLLLDPELARGEYEELTKEEIALL 233 

A related DNA sequence was identified m S.pyogenes <SEQ ID 6653> which encodes the amino acid 
sequence <SEQ ID 6654>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 33 10 (Affirmative) <: suco 

bacterial membrane Certaintyt=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=a. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/194 (57%) , Positives = 138/194 (70%) 

Query: 1 MRLDKLLGQAGFGSRNQVKKLICSRQVSVDGQIVTKDNVIVDSGLQSIFVGKERVCLKES 60 

MRLDKLL GSR+QVKKLI ++ V VD VD GLQ I V +RV + 

Sbjct: 1 MRLDKLLEGTKVGSRSQVKKIIKAQQVWVDHMPARNGRQNVDPGLQLIEVTGQRVTHPKH 60 

Query: 61 STYYLLYKPSGVVSAVRDSEHKTVIDLISEKDKOTlSLYPIGRLDRDTEGriLIVTNNGPLQY 120 

SY +L KPSGWSA +r)+ + TVID ++E+DK LYP+GRLDRDTE6rrf-++T+NGPLG+ 
Sbjct: 61 SYIILNKPSGWSAKKD™YLTVIDQLAEEDKSPDi:iYPVGRr.DRDTEGLVLLTDNGPLGF 120 

Query: 121 RMLHPKHHVAKTYYVEWGFLERDAITFFEEGWFDDGTKCKPAELTIDTANNDKSTARI 180 

RMLHP HHV+KTY V WG I. DA FF G-H F G +C+PA+LTI A+ D+S A + 
Sbjct: 121 RMLHPSHHVSKTYLVTVNGLLAEDASDFFAAGICFPTClEQCQPAQLTILKaDTDQSQASL 180 

Query: 181 TITEGKFHQVKKMF 194 

TI+EGKFHQVKK F 
Sbjct: 181 TISEGKFHQVKKCF 194 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2153 

A DNA sequence (GBSx2269) was identified in S.agalactiae <SEQ ID 6655> which encodes the amino 
acid sequence <SEQ ID 6656>. Analysis of this protein sequence reveals the following: 

1 imcleavable N-term signal seq 

Final Results 

bacterial membrane — Certainty*-0 . 0000 (Not Clear) < suco 

bacterial outside — Certaiiity=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9745> which encodes amino acid sequence <SEQ ID 9746> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 22 MGLLVDGKWVDQWYDTftSl<MKFVRTVTQFRHWVTKDGSaGPSGnAGPKAESGRVHLW 81 

MGLLV+G W DQWYDT STGG+FVR +QFRHW+T DGS GP+G GFKAE+GRraLyVS 
Sbjct: 1 ^CLLWaIWQDQWYDTESTGGEFVRHDSQPRHWITPr)GSPGPTGHGGFK3ffiRGR■YHLYVS 60 

Query: 82 LACPWASRVLIMRKLKNLESHISISIVNPLMLENGWTFQEYKGVIPDMINQSQYLYQIYQ 141 

LACPWA R LI RKLK LE I +S+V+ LM ENGWTF GV+PD + ++YLYQIY 
Sbjct: 61 LACPWAHRTLIFRKLKGLEGMIDVSWHWLMRENGWTFAPGPGVMPDPLFNAEYLYQIYT 120 

Query: 142 ASQSDYTGRVWPVLTOKKFHTITONESSEIMRPIIOTAFISKITGm'DD 201 

+ + Y+GRVWP+LWDK+ TIVHNESSEI+R+ N+AF+ + + DYYP +L4 QID 
Sbjct: 121 RaD2VQYSGRVWPIIMKQKariVNNESSEIIRIEHSaFDGU3aKSGDYYPK3^ 180 

Query: 202 MNNFIYPKISINGVYKRGFATSQIWYQKEWETLFTALDQIiEKHLSDiraYLVBEaFTEM 261 

+N+ lY INNGVYK GFAT+Q Y++ + LF +LD LE L + YL G-1-+ TEAD R 
Sbjct: 181 lOTRIYHTINNGVYKCGFATTQTAYBEAIAPLFESLDWIiEGILQGHQYLTGDEITEADWR 240 

Query: 262 LFTTLVRFOTVYYGHFKCtttKAIiHDYPHLram'KRIYNLPGIAETVNFDHIKKHYYGSHK 321 

LFTTL+RFD VY aHFKCNL+ + DYP+LM Y + +Y+ PGUffiTVNF HIK HXY SH 
Sbjct: 241 LFTTLIRFDWYVGHFKCNLRRIQDYPNLWRYIiRDLYHQPGIAETVNFQHIKGEIYYESHL 300 

Query: 322 TINPTGIIPAGPNLDWTI 339 

INPTGI+P GP LD ++ 
Sbjct: 301 NmPTGIVPMGPALDLSL 318 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6656 (GBS655) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 143 (lane 2-4; MW 27kDa). 

Based on this analysis, it was predicted that this protem and its epitopes, could be usefiil antigens for 

vaccines or diagnostics. 

Example 2154 

A DNA sequence (GBSx2270) was identified in S.agalactiae <SEQ ID 6657> which encodes the amino 
acid sequence <SEQ ID 6658>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
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»> Seems to have no H-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1116 (Affirmative) ■ 

bacterial menibrane --- Certainty=0. 0000 (Not Clear) < i 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the GENPEPT 

>6P:CAB12030 GB:Z99105 similar to glucosamine -6 -phosphate isomerase 
[Bacillus siibtilis] 
Identities = 112/243 (46%) , Positives = 163/243 (66%) , Gaps = 10/243 (4%) 

MRVITVKNDIEGGKIAFTLLEEKMKAGAQT-LGIATGSSPITFYEEIVKS MLDFSN 55 

M+++ ++ E K++ +++E+++A LGLATGS+P+ Y++++ +DFS 

MKIIiIAEHYEELCKLSAAIIKEQIQAKKDAVLGLATGSTPVGLYKQLISDYQAGEIDFSK 60 



^ NLDEY 0++ S+ QSY++FMH+HLF 





1 


Sb j ct : 


1 




56 


Sbjct: 


61 


Query: 


114 


Sbjct: 






172 


Sbjct: 


181 




231 


Sbjct: 


241 



+SMGI +IM+ SK IVL+A G EKA+AI M +GP+T D+PASILQKH+ V +1 D AA 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6659> which encodes the amino acid 

sequence <SEQ ID 6660>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 174 - 190 ( 174 - 190) 

Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12030 6B:Z99105 similar to glucosamine-6 -phosphate isomerase 
[Bacillus subtilis] 
Identities = 120/244 (49%) , Positives = 162/244 (66%) , Gaps = 12/244 (4%) 

Query: 1 MKIIRVQDQIEGGKIAFTDLKDSI.-AEGRKTLGLATGSSPISFYQEM\'KS PLDFSD 55 

MKI+ + E K++ ++K+ + AK IiGLATGS+P+ Y++++ +DFS 
Sbjct: 1 MKILIAEHYEELCKLSAAIIKEQIQ&KKDAVLGLATGSTPVGLYKQLIBDYQAGEIDFSK 60 

Query: 56 LTSINLDEyVGI,SVESDQSYDYFtffiQNLF---]SOUCPFKKira,ENGLATDVEAEAKRYNQI 112 

+T+ NLDEY GLS QSY++FM ++LF N +P ++P G +EA K Y + 
Sbjct: 61 VTTEIttXlEYAGI^PSHPQSYNHFMHEHLFQHIHMQP-DHIHIPQGDIJPQLEAACKVYEDIi 119 

(Juery: 113 lAEHP-IDFQVLGIGRNGHlGFNEPGTSFEEETHWDLQESTIEANSRFFTSIED-VPKQ 170 

I + ID Q+L6I6 NGHIGFNEPG+ FE+ T W L ESTI+AN+RFF VP+ 
Sbjct: 120 IR(».GGIDVQIIK3IGANt3HIGFIffiPGSDFEDRTR\AnCLSESTIQANaRFFGGDPVLVPRL 179 

Query: 171 AISMGIASIMK-SEMIVIlAFGQEKaDAIKGMVFGPITEHLPASILQKHDHVIVIVDEAA 229 

AISMGI +IM+ S+ IVLLA G+EKBDAI+ M GP+T +PASILQKH+HV VI D A 
Sbjct: 180 AISMGIKTI^ffiPSKHIVLIASGEEKADAIQK^MGPml)VPASILQKHNHVTVIADYKA 239 
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Query: 230 ASQL 233 
A +ls 

Sbjct: 240 AQKL 243 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/233 (69%) , Positives = 201/233 (85%) 

MRVITVKNDIEGGKIAFTLLEEKMKAGAQTLQLATGSSPITFYEEIVKSNLDFSNMVSIN 60 
M++I V++ 1EGGKIAFTLL++ + aa.+TLGLATGSS?I+Py+E+VKS LDFS++ SIN 
MKIIRVQDQIEGGKIAFTLLKDSLAKGAKTLGLATGSSPISFYQEMVKSPLDFSDLTSIN 60 



Q+LGIGRNGHIGENEPGT F+ THWOL STIEftNSRFF SI+DVEKQR+SMGI SIM 





1 


Sbjct: 


1 


Query: 




Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 



KS+ IVL+A4G EKA+AI M+ GPITE +PASILQKHD V++rVDEAAAS+L 
KSEMIVLLAFGQEKADAIKGMVFGPITEHLPASILQKHDHVIVIVDEAAASQL 233 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2155 

A DNA sequence (GBSx2271) was identified in S.agalactiae <SEQ ED 6661> which encodes the amino 
acid sequence <SEQ JD 6662>. Analysis of this protem sequence reveals the following: 



Possible site: 61 

»> Seems to have no N-terrainal signal sequence 

IMTEGRAL Likelihood = -8 

IMTEGRAL Likelihood = -6 

IHTEGRAL Likelihood = -5 

IMTEGRAL Likelihood = -1 

IMTEGRAL Likelihood = -1 



Transmembrane 169 - 

Transmembrane 151 - 

Transmembrane 42 - 

Transmembrane 207 - 

Transmembrane 24 - 



- Certainty=0. 4248 (Affirmative) < succs 

bacterial outside Certainty^O . 0000 (Not Clear) < suoo 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 9 QQENILRAGVLGANDGIISVAGWIGVASATHNIWIIFLSAASAILAGAFSMAGGEYVSV 58 

+Q+ LRA VLGANDGl+S + ++IGVASA + I L+ S ++AGA SMA GEYVSV 
Sbjct: 17 RQMGWLRASVICT^^^GILSTSSL^a:G^m£AHGSSGNILLaGMSGLIAGaLSMAAGEYVSV 76 

Query: 69 STQKDTEQAAVaREEKLLEMNPELaKKSLVDIYlliAKBESHEHaQWLVDKIUFSKN^^ 128 

S+Q D EQA VaRE L+ MP K L +IY-H +G E A + ++ + NA+E + 
Sbjct: 77 SSQHDMEQJffiVRREHAELKANPHaEKHEXiaEIYVERGLDRELaLQVaEQUIAHNaLEAHL 136 

Query: 129 EEKYGIEFGEYTSPWHAAISSPIAFAIGSIFPTITIIiLLPFSVRIVGTVIIVIVSLLSTG 188 

++ G+ P AA++S I+F+ G+I P +T L P + + +1 1+ L G 

Sbjct: 137 RDELGLTDSLIARPVQAALASAISFSGGAIVPFLTAIiFSPPEIINITISLISILCLAVIiG 196 

Query: 189 YVSAKLGQAPTVPAMRENVMIGCLTMLATYVIGQLF 224 

VALGA AR GLM+TIGF 
Sbjct: 197 MVGAHLGGANVPKaALRVTFCGALAMIGTAAIGSFF 232 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that tiiis protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2156 

5 A DNA sequence (GBSx2272) was identified in S.agalactiae <SEQ ID 6663> which encodes the amino 
acid sequence ' <SEQ ID 6664>. This protein is predicted to be S-adenosylmethionine tRNA 
ribosyltransferase (queA). Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty^O .3438 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside CertaintysO . 0000 (Not Clear) < suco 

15 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14732 GB:Z99118 S-adenosylmethionine tRNA ribosyltransferase 
[Bacillus subtilis] 
Identities = 228/341 (66%) , Positives = 279/341 (80%) 

20 . 





1 


^fflmroFDFYLPEELIAQTPI^KRnaSKIiVIDHKNKTMTDSHFDHILDELKPGlmV^^ 


60 






M + FDF LPE LIAQ PLE+RDaS+L+V+D +TDS F HI+ GD LV+NN 




Sb j Ct : 


1 


MKVDLFDFELPERLIAQVPLEQRDASRmVLDKHTGELTDSSFKHIISFENEGDCLVLNN 


60 




61 


TRVLPARLYGEKQDTHGHVELIiLKlTrEGDQWEVLRKPAKRLRVCTKVSFGDGRLIATVT 


120 






TRVLPARL+G K+DT VELLLLK GD+WE LRKPAKR++ GT V+FGDGRL A T 




Sb j ct : 


61 


TRVLPaRLFGTKEDTGAra7ELLLLKQETGDK]«IETLAKPAKRVKKGTVVTFGDGRLKAICT 


120 




121 


KELEHGGRIVEFSYDGIFLEVI^lSLGEMPLPPyiHEKLEDRDRyQTVYAKEIJGSaAAPTA 


180 






+ELEHGGR +EF YDGIP EVLESLGEMPLPPYI E+L+D++RYQTVY+KE GSAAAPTA 




Sbjct: 


121 


EELEHGGRKMEFQYDGIFYEVLESLGEMPLPPYIKEQLDDKERYQTVYSKEIGSAAAPTA 


180 


Query: 


181 


GLHFIlCELLEKIKTKGVKLVYLTLHTOLGTEra'VSVimiDEHEMJSEFYQLSKEAADTIM 


240 






6LHFT+E+L++++ RGV++ ++TLHVGI1GTFRPVS D ++EH MH+EPYQ+S+E A LN 




Sbjct: 


181 


GUaFTEEILQQLKDKGVQIEFITLHVGLGTFRPVSaDEVEEHNMHAEFYQMSEETAARLK 


240 




241 


AVKESGGRIVAVGTTSIRTI^TIGSKFNGELKaDSGVraJIFIKPGYQFKVVnAFSTNnFHL 


300 






V+E+GGRI++VGTTS RTLETI + +G+ KR SGWT+IFI EGY+FK +D TNFHL 




Sbjct: 


241 


KVRENCSGRIISVGTTSTRTLBTIAGEHDGQFKASSGMTSIFIYEGYEFKAIDGMITNFHL 


300 


Query: 


301 


PKSTLVMLVSAFAGRDFVLEAYKHAVEERYRFFSFGDAMFV 341 








PKS+Irt-MLVSA AGR+ +L AYNHAVEE YRFFSFGDAM + 




Sbjct: 


301 


PKSSLIMLVSALAGRENiriRAYNHAVEEEYRFFSFGDAMDI 341 





45 A related DNA sequence was identified in S.pyogenes <SEQ ID 6665> which encodes the amino acid 
sequence <SEQ ID 6666>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 



Identities = 297/341 (87%) , Positives = 322/341 (94%) 
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Query: 1 MNTMDFDFYLPEELIAQTPLEKRDASKLLVlDHKNKTiyffDSHFDHILDELKPGDJU^Viy™ 60 

MNTN+FDF LPEELIAQTPLEKRD+SKLL+IDH+ KTM DSHPDHI+D+L PGD2VLVMNN 
Sbjct: 1 MNTNNFDFELPEELIAQTPBEKRDSSKDLIinHRQKmOJSHF^ 60 

Query: 61 TRVLPARLYGEEQDTHGHVELLLLKKTTEGDQWEVIAKPAiail^ 120 

TRVLPARLYOEK DTHCSHVEEUiLKNT+GDQWBVLaKPAK^ AT+ 
Sbjct: 61 TRVLPARLYGEKPDTHGHVELLLLKimjGDQKEVLaKPAKRLKVGSQVNFGDGR]^ 120 

Query: 121 KEa^aQRIVEPSYDGIFIBVIJ!SIX3EMP]a>PyiHEKIiEDRDRyQTVyAKENGS^^ 180 

ELEHGGRIVEFSYDGIFLEVLESLGEMPLPPYIHEKLED +RTOrVYAKENGSAARPTA 
Sbjct: 121 DELEHGtSRIVEFSYDGIFLEVLESLGEMPLPPYIHEKLEUaERYOTVYAKENGSaaAPTA 180 

Query: 181 GLHFTKELLEKIETKGVKLVYLTLHVGLGTFRPVSVDNLDEHEMHSEFYQIjSKEftaDTIiN 240 

GLHFT +LL+KIE KGV LVYLTLHVGLGTFRPVSVDNLDEH+MHSEFY LS+EAA TL 
Sbjct; 181 GLHFTTDLLKKIEAKGVHLVYLTLHVGLGTFRPVSVDJILDEHDMHSEFYSLSEEflAQTLR 240 

Query: 241 AVKESGGRIVAVGTTSIRTLETIGSKFNGELKADSGWTKIFIKPGYQFKWDAFSTNFHL 300 

VK++GGR+VAVGTTSIRTI1ETIG KP G+++ADSGtraiIFIKPGYQFKWDAFSTNFHI. 
Sbjct: 241 DVKQAGGRVVAVQTTSIRTLETICMKFQGDISffiSGWTtJIFIKPCKQFKVVDAFSTNFm 300 

Query: 301 PKSTLVMLVSAPAGRDPVLEAYMHaVEERYRFFSFGDaMSV 341 

PKSTLVMLVSAFAGRDFVLEAY HAV+E+YRFFSFGDAMFV 
Sbjct: 301 PKSTLVMLVSAFAGRDFVLEAYRHAVDEKYRFFSFGDAMFV 341 

Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2157 

A DNA sequence (GBSx2273) was identified in S.agalactiae <SEQ ID 6667> which encodes ttie amino 
acid sequence <SEQ ID 6668>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have an uncleavable N-te3nn signal seq 

INTEGRAL Likelihood =-14.22 Transmembrane 14 - 30 ( 6 - 34) 

Final Results 

bacterial membrane Certainty=0 . 6689 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6669> which encodes the amino acid 
sequence <SEQ ID 6670>. Analysis of this protein sequence reveals the following: 

i N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2655 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/195 (64%) . Positives = 155/195 (78%) , Gaps = 1/195 (0%) 

Query: 160 MEERFDITETDYEYIGEHNNYVAAFSQRMSIDDMQKYSLVYSENTPAYALAERIGGMDSA 219 

M ERFDITETDYEY EH+ YVA F+GAMSI DMQ+YSLVYSENTPAYALAER+GGM+ A 
Sbjct: 1 MTERFDITETDYEYDQEHHAYVAQE^OiMSIPD^^QEYSLVYSENTPAYAIlAERLGG^lNKa 60 

Query: 220 YSKFGRYGQSKGDIKNIQKNGNKVTTDYYIQVLDYLWOIRKKXDSLITYLEEAFPTDYYR 279 

Y F RYG+ G I I +NGNK+rr Yy+QVLDYLW+H+ KY ++ Y+ E+FP YY+ 
Sbjct: 61 YQLFDRYGKVSGAITTIDRMGNKITTAYYLQVLDYLWQHQDKYKDILYYIGESFEDLYYK 120 
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Query: 
Sbjct: 



280 ALIPSDWVAQKPGYVREALNVGAIVKEEVPYIVAIYTAGLGGSTQEDSEINGVGLYQLE 339 

+P V V QKPGYVREAiaWGAIV EE PY++A+Y++BLGG+TQ E+NG+G QL 
121 TYLP-HVK\«'QKPGYTOE7UaqVGAIVCEESPYLIALYSSGIiGGATC3RSEEVNGLGY^ 179 



CJuery: 340 QLCEVINQWHRVHMN 354 

QL +VIN+W+R N+N 
Sbjct: 180 QLPYVINEWYRGNLN 194 



30 



SEQ ID 6668 (GBS680) was expressed in E.coli as a GST-flision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 164 (lane 10-12; MW 64kDa) and in Figure 239 (lane 9; MW 64 kDa). It was 
also expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell extract is shown in Figure 
164 (lane 15; MW 40kDa) and in Figure 188 (lane 9; MW 40kDa). Purified GBS680-His is shown in Figure 
242, lane 8. Purified GBS680-GST is shown in Figure 246, lanes 6 & 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2158 

A DNA sequence (GBSx2274) was identified in S.agalactiae <SEQ ID 6671> which encodes the amino 
acid sequence <SEQ ID 6672>. Analysis of this protein sequence reveals the following: 

Possible site: 17 



> Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = -4.57 Transmembrane 


8 


24 




25 


INTEGRAL 


Likelihood = -2.13 Transmembrane 




82 


65 


84 


INTEGRAL 


Likelihood = -1.65 Transmembrane 


107 


123 


107 


125 


INTEGRAL 


Likelihood = -0.69 Transmembrane 


36 


52 


36 


52 


INTEGRAL 


Likelihood = -0.48 Transmembrane 


89 


105 


89 


105 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 2826 (Affirmative) . 

- Certainity=0. 0000 (Not Clear) < i 

- Certainty^O. 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT d 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 



Example 2159 

A DNA sequence (GBSx2275) was identified in S.agalactiae <SEQ ID 6673> which encodes the amino 
acid sequence <SEQ ID 6674>. Analysis of this protein sequence reveals the following: 



Possible site 

INTEGRAL 
INTEGRRI, 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



signal seg 
Transmembrane 108 - 
Transmembrane 181 - 
Transmembrane 220 - 
Transmembrane 6 - 



124 ( 

197 { ] 

■ 236 ( : 

■ 22 ( 
■417 ( < 
-295 ( : 

- 47 ( 

- 260 ( : 



- Final Results - 



wo 02/34771 



PCT/GBOl/04789 



-2433- 



bacterial membrane Certainty=0 .4949 (Affirmative) < suco 

bacterial outside CertainLty=0. 0000 (Not Clear) < suco 

bacterial ■ cytoplasm Certainty=0. 0000 (Wot Clear) < suco 



5 The protein has homology with the following sequences in the iaENPEPT database. 

>GP:AAC21770 GB!032694 H. Influenzae predicted coding region HI0092 
[Haemophilus influenzae Rd] 
Identities = 232/416 (55%) , Positives = 314/416 (74%) , Gaps = 3/416 (0%) 

10 Query: 4 TFTTrGaLI6IJUAILriIIKKVHPAYSLIMM.VGGLIGGGDLVTIV^^ 63 

T + Q2^+ Ij +AI LI+KKV PAY +++GALVGGLIGG DL V+ M+ GRQG+ ++ 
Sbjct: 3 TVSAIGM.VaLIVaiFLILKK7SPAYGMLVG&LVGGLIGGRDLSQTVSLMIGGAQGITTA 62 



Query: 64 IIiRILTSGILAGRLIKTOSAEKIAESIIKKLGQQRAITALAIATMIICAVGVFIDIAVIT 123 

++RIL +G+LAG LI++G+A I E+I KLG+ RA+ ALA+ATMI+ AVGVF+D+AVIT 
Sbjct: 63 VMRIIJUiGVIAGVLIESGaANSITETITNKLGETRAIJlMALA™ 122 

Query: 124 VAPIAIAIGKKANLSKSSILMmiGGGKAGNIISPNENTIAASEAFKVDLTSI^ 183 

V+PIALA+ ++4-+LSK++ILIAMIGGGKaGNI+SENEN IflA++ P + LTS+M+ IIP 
Sbjct: 123 VSPIAliaLSRRSDLSKAAII.Ii!»IIGGGKACaiIMSPNENAIA2UfflTFHLPLTSAMlAGIIP 182 

Query: 184 AIAALWTIILAKIVSKKNMDISYDSEEQV--GSDLPAFLPAISGPLWICLLALRPLFG 241 

A+ L++T LAK + K + ++ D E V +LP+FL A+ PLV I LLALRPLF 
Sbjct: 183 ALFGLILTYFLAKRLINKGSKVT-DKEVIVLETQNLPSFLTALVAPIiVMLLLALRPLFD 241 

Query: 242 ITIDPLIALPLGGLISIIATGYLKETVPFVEYGLSKWGVSILLIGTGTLSGIIKASNLQ 301 

I +DPLIALPII1GGLI G I1+ + GLSK+ V+I+L+GT6 L+GII S Ij+ 

Sbjct: 242 IKVDPLIALPIiGGLIGAFCMGKLRNINSYAIKGLSKMTPVAI^ILLGTGftLAGIIANSGLK 301 

Query: 302 FDMIHIiEFL^mPTFIIAPI.SQIFMQaATASTTSGTTIASQTFAETLIKSGVPAVSGA2iM 361 

+1 IiE +P++ILAP+SG+ M ATASTT+GT +AS F+ TL++ GV +++GAAM 
Sbjct: 302 EVLIQGLEHSGa:.PSYILJiPISGVMSLATASTTAGTa\fflS]WFSSTr.r.EI^^ 361 

Query: 362 IHfiGATVLDSLPHGSFFHATGGAVNMAIKDRMKLISYEALIGLTSTIVAVVYYCFF 417 

IHAGATV D +PHGSFFHATGG+VNM IK+R+KLI YE+ +GL TIV+ + + F 
Sbjct: 362 IHAGAWFDHMPHGSPPHATGGSVIiOTIKERLKLIPYESAVGLMMTIVSTLIFGVF 417 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6675> which encodes the amino acid 
sequence <SEQ ID 6676>. Analysis of this protein sequence reveals the following: 



Possible 

IMTEGRRL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 

INTEGRAL 
INTEGRAL 



site: 51 

ive an uncleavablc 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



-terra signal seq 
Transmembrane 2 
Transmembrane 
Transmeiribrane 2 



11 



• Pinal Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



256 ( 236 - 255) 

19 ( 1-32) 

2B5 ( 263 - 289) 

123 ( 102 - 141) 

323 ( 303 - 330) 

40 { 23 - 43) 

438 ( 420 - 442) 

140 ( 124 - 141) 

205 ( 184 - 207) 

81 { 65 - 82) 

409 ( 393 - 409) 

■ 165 ( 149 - ISS) 



- Certainty=0. 5458 (Affirmative) < succ; 
• Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 




60 The protein has homology with the following sequences in the databases: 

>GP:BAB07616 GB:AP0G1520 unknown conserved protein [Bacillus halodurans] 
Identities = 155/435 (35%) , Positives = 248/435 (56%) , Gaps = 21/435 (4%) 

Query: 7 LGVLVGVIVIIYLYVKEVNIIIAAPLATSLVILFNQMDPTTTLLGKEENQFMGMiSTYIL 66 
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LG+++G+++++ L + +11 AP+A +V LF +D hh + +M + 
Sbjct: 2 LGIVLGLVIMVIAYRGWSIIIimPIJUVGVVM.FGGLD----LLPAYTDT^ 57 

Query: 67 NyFAIFLLGSIIJUa>»IETSGATTSIADYILKKTOHDSFYKVLVAIF]:.ISAILTYGGISi:.F 126 

+P +F+LG+I KLME +GA S+A I K +G + ++ + L A+LTYGGISLF 

Sbjct: 58 QWFPVFMLGAIFGKLMEDTGaaRSVASAITKLIGTK---RAILG\mi/^ 114 

Query: 127 VVMFAVIjP]J\RSLFKKMDLAraiLIQVPLWI.GIATFTMTILPGTPAIQNVIPlQYLD^^ 186 

W+FA+ PLA +LF++ +++ LI + LG TFTMT +PGTP IQN+IP Y T+ 
Sbjct: 115 WVFAMyPLAIALFRERNISRRLIPGTIAIXSAFTFTOTAVPGTPQIQmiPTSYYGTNAM 174 

Query: 187 AAAIPSIVGSIGCra^GLFYMKYC^iAKSMARGETYATYAFDNEIQVKTKNLPHFLASILP 246 

AA + ++ ++ G Y+ + K GE + T + E + + + +P+ S LP 

Sbjct: 175 AAPMMGVIAALIMGIGGYTYLVWREKKLKEAGE-FFTEPKNGEKEEEGEKVPNPWLSFLP 233 

Query: 247 LLLLIIIALTGSLFGIJDFFKKNIIFl?j:.LAVILTASWLFRQFIPNKIAVFNLGASSSIAP 306 

L+ +1+ T +L D I +AL++ 1+ L + I N GA S+ 

Sbjct: 234 LVSVIV---TLNLLQWD -IVLALISGIVLIMLMVGKVKGFIQSMNQGAGGSVLA 284 

Query: 307 IFATASAVAFGAWMIVPGFTFFSDLILNIPGNPLISLAVLTSSMSAITGSSSGALGIVM 366 

I T++AV PG+W VPGF ++L+L I G+PLIS AV + ++ TGS+SG +GI + 
Sbjct: 285 IINTSAAVGFGSVVRAVPGFERLTELLLGIQGSPLISQAVAINVLAGATGSASGGMGIAL 344 

Query: 367 PNFAQYYLDQGIOTEMIHRVATIASNIFTIVPQSGVFLTFLALTGIMHKNAFKETF 422 

+ Q ++ G++PE HRVA+IAS +P +G LT,LA+TGL+HK ++K+ F' 

Sbjct: 345 EAI/3DRYMQ1AMETGMSPEAFHRVASIASGGLDTLPHNGAVLTLLAITGLSHKESYKDIF 404 

Query: 423 ITVSVSTFIAQVIVI 437 

+ V ++ I 
Sbjct: 405 WGCVIPIVBVAFAI 419 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/395 (22%) , Positives = 167/395 (42%) , Gaps = 40/395 (10%) 

Query: 9 GALIGLALAILLIIKKVHPAYSLILGALVGGLIGGGDLVTIV NTMVLGAQG- -MMS 62 

G I.+G+ + I L +K+V+ + L + L D T + +QA +++ 

Sbjct: 8 GVLVGVIVIIYLYVKEVNIIIAAPLATSLVILFNQMDPTTTLLGKEENQFMGaLSTYILN 67 

Query: 63 SILRILTSGILAGALIKTGSAEKIAESIIKKLGQQ---RAITAIiAIATMIICaVGVFIDI 119 

L ILA + +G+ 1A4 I+KK+G + + A+ + + 1+ G+ + + 

Sbjct: 68 YFAIFLLGSILAKLMETSGATTSIADYILKKVGHDSPYKVLVAIFLISAILTYGGISLFV 127 

Query: 120 AVITVAPIALAIGKKANLSKSSILLAMIGGGKAGNII SPNPNTIAASEAFKVDLTS 175 

+ V P+A ++ KK 4-L+ + I + +G + +P ++ LT+ 

Sbjct: 128 VMFAVLPLARSLFKKMDLAWKLIQVPLWLGIATFTMTILPGTPAIQNVIPIQYLDTSLTA 187 

Query: 176 LMVQKIIPAIAALWTII LAKIVSKKNNDISY--DSEEQVGS-DLPAFLPAISGP 227 

+ +H- +1 + + LAK +++ +Y D+E QV + +LP FL +1 

Sbjct: 188 AAIPSIVGSlGCTAFGLFYMKYa:AKS^Ia3M3ETYATYaFDMEIQiVKTK^ 247 

Query: 228 LWICLLALRPLFG ITIDPLIALPLGGLISILATGYLKETVPFVEYGLSKWG 280 

L++I + LFG I L+A+ L SL+++ GS + 

Sbjct: 248 LLLIIIALTGSLFGOT)FFKKNIIFIALLAVIL--TASVOiFRQFIPNKIAVFlSILGASSSIA 305 

Query: 281 — VSILLIGTGTLSGIIKASNLQFDMIHLLEFLNMPTFILAPLSGIFMGAATASTTSGT 337 

+ + G + 1+ D+I L P LA L+ MAT S++ 

Sbjct: 306 PIFATASAVRFGAWMIVPGFTFFSDLI--LNIPGNPLISLAVLTS-SMSAITGSSSGAL 362 

Query: 338 TIASQTFAETLIKSGVPAVSGAAMIHAGATVLDSL 372 

I FA+ + G+ MIH AT+ ++ 

Sbjct: 363 GIVMPNPAQYYLDQGL NPEMIHRVATIASNI 393 



Based on this analysis, it was predicted that these proteins and flieir epitopes could be useful antigens 

vaccines or diagnostics. 
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Example 2160 

A DNA sequence (GBSx2277) was identified in S.agalactiae <SEQ ID-6677> which encodes the amino 
acid sequence <SEQ ID 6678>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.24 Tranamembraiie 85 - 101 ( 84 - 101) 



Final Results 

bacterial menibrane CertaintjfcO . 2296 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 







Sbjot: 


1 


Query: 


61 


fibjot: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 






241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 




361 


Sbjct: 


351 



h A+IEMARA+G-I- L+ ++RNPL TTT G GE+I 



IIGIGGSATNDGGAGM+QALG ' LLD 



EEliKECDFKIACDVTNPLCGAQGCSSIFGPQKGADEDMITKMDTWLSKYATrATaVSEKA 240 

L+ ++AC4-V NPL G +G +++FGPQKGA DM+ +D +S++A +A 
SRLRWKIiEVACOTDNPLTGPKGATAVFGPQKGATAPMLDVLDQNVSHFADMAEICaLGST 240 



EG GftaGGLG++ L + A L+ GIDI+L ++ E + +ADLV+TGEGR+D QTV 



GK PIGVAK AK Y V+ +GS++ D+ 



A related DNA sequence was identified in S.pyogenes <SEQ ED 6679> which encodes the amino acid 
sequence <SEQ ID 6680>. Analjrais of this protein sequence reveals the following: 

Possible site: 49 

5 N- terminal signal sequence 

- 376 ( 360 - 376) 



Final Results 

bacterial membrane Certainty=0. 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

= 25/345 (7%) 

Query: 24 MKILmiDSFKGSVTSPEIOTSVAQALLSVDKQLVIETRAIADGGEGSLVALSCJTVAGRW 83 
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MKI++A CS+K S+++ E+ ++ + + + +flDGGEG++ A+ G 

Sbjot: 28 MKIVIAPDSYKESLSRSEVAQAIEKGFREIFPDAQYVSVPVJUDGGEGtVEftMIiyVTQGAE 87 

Query:. 84 HQVKTIDLLRRPIKVAY--yRHaKQAFIESflSIIGIDKITSNSVTYAQATSYGLGIAVKD 141 

L + ++ K AFIB A+ a-H+ + + TS G G + 

Sbjct: 88 RHAVOTGPL6EKSWASWGISGDGKTflFIE^1AAASGLEI^VPAEKRDPLVTTSRGTGKI^ILQ 147 

Query: 142 AIQKGATQIEIMLGGTGTSDGGRGFLESUSTDFMT GRSYLDTLftSPVTLLGL 193 

A++ GAT I I +GG+ T+DGG G G L+TL + + + GL 

Sbjct: 148 ALESGATNIIIGIGGSATtnXSGAGMVQADGAKLCnaNGIffilGFGGGSni^ 206 

Query: 194 T DVTNPYHGPQGFAAVFGPQRGGSLSQIEETDQIASNFAKKVFCQTTI 241 

DVTNP G G + +FGPQKG S + I E D S++A+ + + 
Sbjct: 207 DPRLKDCVIRVACDVTNPLVGDNGASHIFGPQKGASEaMlVELDNNLSHYAEVIKKADHV 266 

Query: 242 DLQTIPGSGAAGGLGGaiV-I.LGGTLTSGFSRIAELLNLDNSLQSCDLVITGKGCLDTQS 300 

D++ +PG+GAAGG+G A++ LG L SG + LNI.+ + C LVITGEG +D+QS 
Sbjct: 2S7 DVKDVPGAGAAGGMGAALMAFLGftELKSGIEIVTTALNLEEHIHDCTLVITGKGRIDSQS 326 

Query: 301 QSGKVPVAIAKMAKKYQVPTIALCGSVKIETGuAAEDFL-AVFSI 344 

GKVP+ +A +AKKy P I + GS+ + G+ + 4 AVFS+ 
Sbjct: 327 IHGKVPIGVANVAKKYHKPVIGIAGSLTDDVGWHQHGIDAVFSV 371 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 128/379 (33%) , Positives = 194/379 (50%) , Gaps = 23/379 (6%) 

Que:^^: 1 MKVWAIDSLKGSLSSLEAGNAIKESINEVISGADVm'HPLADGGEGTVEALTLGMGGTI 60 

MK++VAIDS KGS++S E ++ +++ V +E +ADGGEG++ AL+ + G 

Sbjct: 24 MKILVaiDSFKGSVTSPELNTSVAQALLSVDKQLVIETRAlADGGEGSLVALSQTVAGRW 83 

Query: 61 ETIPVKGPLGEKVHASYGIIPQRQLAIIEMAAAAGITLIATEERNPLHTTTYGVGEMIKD 120 

+ L + +Y + A IE A+ GI I + T+YG+G +KD 

Sbjct: 84 HQVKTIDLIJ?RPIKVAY--YRHaKQAFIESASIIGIDKITSNSVTyflQATSYGIiGIAVKD 141 

Query: 121 AISKGCRHFlIGIGGSAmiGGAGMLQALGYAIJJJKnKQEISIBAQGI^ 180 

AI KB I +GG+ T+DGG G L++I1 Y + G + L ++++ + 

Sbjct: 142 AIQKGATQIEIMLGGTGTSDGGKGFLESMJYDFMT GRSYLDTLASPVTL 190 

Query: 181 EELKECDFKIACDVTNPLCGAQGCSSIFGPQKGADEDMITK^roTWLSNYATIlATSVSEKA 240 

L DVTNP G QG +++FGPQKG I + D SN+A + 

Sbjct: 191 LGLT DVTNPYHGPQGFAAVFGPQKGGSLSQIEF.TDQIASNFAKKVFCQTTID 242 

Query: 241 DATIEGTGAAGGLGFAFLAFTNATLEPGIDIILSEIKIESAISEADL\'VTGEGRLDGQTV 300 

TI G+GAAGGLG A + TL G I +N++ ++ DLV+TGEG LD Q+ 

Sbjct: 243 LQTIPGSGAAGGLGGA-IVLLGGTLTSGFSRIAELLKLDNSLQSCDLVITGEGCLDTQSQ 301 

Query: 301 MGKAPIGVAKIiAKKYGKKOTAFSGSVTEDAILCNQHGIDAFFPIVRRliISLDEAMSKEVA 360 

GK P+ +A++AKKY +A GSV + L + +AFI++ ISL+ A+ K 
Sbjct: 302 SGKVPVAIARMAKKXQVPTIALCGSVKIETGLAAEDFL-AVFSIQQQPISLEAAIDKTTT 360 

Query: 361 YKNMKETATQVFRLINLYN 379 

N+K A + LI +N 
Sbjct: 361 LSiSHKILAAHLMLIiIAQEN 379 

SEQ ID 6678 (GBS409) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 7; MW 45.4kDa). 

GBS409-His was pmified as shown ia Figure 214, lane 6. 

GBS409d was expressed in E.coli as a His-ftision product. SDS-PAGE analysis of total cell extract is 
shown in Figure 166 (lane 3 & 4; MW 35kDa) and in Figure 188 (lane 12; MW 35kDa). Purified protein is 
shown in Figure 240, lanes 9-10. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 

Example 2161 

A DNA sequence (GBSx2278) was identified in S.agalactiae <SEQ ID 6681> which encodes the amino 
5 acid sequence <SEQ ID 6682>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certaintyi=0. 1886 (Affirmative) < suco 

bacterial membrane — CertaintjteO . 0000 (Not Clear) < suco 
bacterial outside — CertaintjteO. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAC21771 GB:U32695 conserved hypothetical protein [Haemophilus influenzae Rd] 

Identities = 97/383 (25%) , Positives = 175/383 (45%) , Gaps = 52/383 (13%) 

Query: 1 MKLHKQIAQQITOSIMWCQQDINFIOTKGIIFASTNPKRVGEFHEIGLKSffiQPC^ 60 
M+Ii K A++IV + +N ++ G+I AS N R+ + H + + +++E+ 

20 Sbjct: 1 MQI£)KOTAKKIVKRflMKIIHHSVNVmHDGVIIASCa3STRIJIIQRHT6A 60 



Query: 61 TD---QESYFGTQAGINIPFYSNCEIiIATIGISGNPNQVGKyALLAQKMTRIiI]:)KEHE-L 116 

Q+ F Q GIN+P +Y + + +GISG P QV +YA L + L 
Sbjct: SI DQALAQKWNFEaQPGINLPIHYLGKNIGWGISGEPTQVKQyAELVKMTAELIVEQQALI. 120 

Query: 117 DYLDFGRKNEASIVLHHLVEGRELDYYYIjNQFIiNQYHLSEKTDYRLLTFEINSQKQKLLL 176 

+ + R+ + +I1 L+ LN + ++ + +F++N + +L+ 

Sbjct: 121 EQESWHRRYKEEFILQ LLHCNI.NWKEMEQQA- - KPFSFDLNKSRVWLI 167 

Query: 177 S QSEMSLIiNFFDK LDTAIYTFNYPNQYWLLLSDHMFDYYYPNI 219 

+ +L+N+ ++ LD + + N +LS M 
Sbjct: 168 KLMIPALiaqLQNLIMYLEQSEERQDVAILSLDQVVVLKTWQHS--TVLSAQM KT 219 

Query: 220 LSKFECEKGriYKUGIGQKSSLSLLKR---SYETSIIiALK-ALKGQQK--VNI.VDDLDI,EL 273 

L + K YK+ +G H-Ii L ++ S++++ L LK + + + D+ L + 
Sbjct: 220 LLPADYSKQDYKIAVGACUSniiPLFEQLPLSFQSAQSTLSYGLKHHPRKGIYVFDEHRLPV 279 

Query: 274 LLTSIDSN1JCQYVIOTALVNL-SEISIDKIL---IJSSYFKHNLSLKECSQELFIHKNTVQYR 329 

LL + ++ BKLLSE + IL LYFNL +++LF+H NT++YR 
Sbjct: 280 LLAGLSHSWQGNELIKPLSPLFSEENAILYKTLQQYFLSNCDIiYLTAEKLFVHENTIiRYR 339 

Query: 330 LNKIYESTQLNPRNFKDATLLYL 352 

LNKI + T L D LYL 

Sbjct: 340 UJKIEQITGLFFNKIDDKLTLYl 362 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2162 

A DNA sequence (GBSx2279) was identified in S.agalactiae <SEQ ID 6683> which encodes tiie amino 
acid sequence <SEQ ID 6684>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm — Certainty=0. 0290 (Affirmative) < suco 
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bacterial itietribrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 4 FPKHFLWGGAV24AKQVEGAFRTD3KGLSVQD^/LPNGGLGD FTAKPTPDNLKLE 56 

FP++FI.WGGa. AMJQ EGA+ D3KGLSVQDV P GG+ T KPT DNLKCj 

Sbjct: 6 FPENFLWGGATaaNQFEGAYNQDGKGLSVQDVTPKGGVAQSGSSSPLITEKPTEDNLKLV 65 

Query: 57 AIDFYHNYKNDIKLFAEMGFKVFRTSIAWSRrFPNGDDEAPI!IEAGr,QFYDNLFDELLKYN 116 

IDFY+ YK DI LFAEMGFKVFR SIAW+RIPPNGED PNEAGL FYD +FDEL KY+ 
Sbjot: 66 GIDFYNRYKEDIALFAEMGFKVPtlLSIAWTRIFPNGDDLEPNEAGriAFYDKVFDELAKYD 125 

Query: 117 IEPLWLSHYETPLHIJUCrYNGWADRELIAFF3KPACTVT<IERY}aDmc™LTFNEVNSIL 176 

IEPLVTLSHyETPLHLA+ YNGWa+R LIAF+E++A+TV RYKDKVKXWLTENEVNS+L , 
Sbjct: 126 lEPLVTIiSHYETPLHIJUlKXNGVmNRELIAFYERXJVRTVFTRYKDKVK^ 185 

Query: 177 HMPFTSGAIMTDKSQLSPQELYQRIHHELVASARVTKLGRSINENFKIGCMILaMPAYPM 236 

H PF SG I+TD QLS Q+BYQA+HHELV SA TK+G INP+FKIGCM+IiAMPAYPM 
Sbjct: 186 HAPFMSGGIITDPEQLSKQDLYQAVHHELWSALATKVGHEINPDFKIGCMVIiAMPAYPM 245 

Query: 237 TSDPRDVLAARQFEQHNLLFSDI^IVRGICYPTYIQSYFKNNGIKIKPEEGDEEVLAQNTVD 296 

T4DP D LA R+FE N LFSD+H RGKYP YI+ YFK+N I IK EGD+E++ +NTVD 
Sbjct: 246 TADPLDQLAVREFENQNYLFSDLHARGKYPNYIKRYFKDNNIDIKMGEGDKELMLENTVD 305 

Query: 297 FLSFSYYMSVTQAYDFENYQSGQGNILGGLTNPHLTTSEWGWQIDPIGLRLVENQYYERY 356 

F+SFSYYMSV A++ E+Y SG+GN+LGGL+NP+L SEWGWQIDP+GLRLVLN Y+RY 
Sbjct: 306 FlSFSYYMSVAAAHNPEDYNSGRG!WLGGI,SNPYLQaSEWGWQIDPVGLRLVLNDSYDRy 365 

Query: 357 QIPLFIVENGLGRKDQLIETI£)GDYTVEDDYRIDY™QHBVQVAKAIEDGVEIMGY^^ 416 

Q+PLPIVENGLGAKD L++ DQ TVEDDYRIDY+ +HL+QV +A++DGV+++GYT+WG 
Sbjct: 366 QLPLFIVENGJU3AKDVLVQ6PDGP-TVEDDYRIDYLQKHLMQVGEAIX2DGVDLLC3YTTWG 424 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5287> which encodes the amino acid 
sequence <SEQ JD 5288>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=C. 0763 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 390/459 (83%) , Positives = 423/469 (90%) 

Query: 1 MTVFPKHFLWGGAVAANQVEGAFRTDGKGLSVQDVLPNGGLGDFTAKPTPDNLKLEAIDF 60 

M +FPK FLWGGAVAANQVEGAF D KGI,SVQDVLPNGGLG++T PT DNL LEAIDF 
Sbjct: 1 MGIFPKDFLWGGAVAANQVEGAFEADAKGLSVQDVLPNGGLGEVJTDSPTSDNLTLEAIDF 60 

Query: 61 YHNYKlTOIKLFAEMGFKOTRTSIAWSRIPEliraDDSAENEAGLQFYDNLFDEIiONIEPL 120 

YH YK DI LFAEMGFKVFRTSIAWSRIFENGDD PNEAGLQFYD+IiFDELL Y lEPL 
Sbjct: 61 YHRYKEDIALFAEMGFKVFRTSIAWSRIFPNGDDDQPNEAGLQFYDDMEIiIiSIYGIEPri 120 

Query: 121 VTLSHYETPI^iAKTYNGWADRRLIAFFEKPAQTVMERYKDKVKXT^ 180 

VTLSHYETPUILRK YHGW DRRLI FFE+FAQTVMERYKDKVKYWLTFNEVMSILHMPF 
Sbjct: 121 VTLSHYETPIfflAKAYNGWTDRRLIGFFERFAQTOffiRYKDKVmnJTENEW^ 180 
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Query: 181 TSGAIMTDKSQLSPQELyQaiHHELVS.SARVTKLGRSINPlIFKIGCMIIAMPAYPM'rSDP 240 

TSG IMT+K +IiS Q+IiYC3RIHHELVASA VTKL INP+ K+GCMILAMPAYPMTSDP 
Sbjct: 181 TSGGIMTEKEKLSLQDLYQAIHHELVASASVTKLAHEINPDVK'v/GCMILRMPAYPMTSDP 240 

Query: 241 RDVLAARQFEQHNLLFSDIHWGKYPTYIQSYFKIINGIKIKFEHlGDEEVIiAQNTVDFLSF 300 

RD+LAA FE MLLFSDIHVRGKYP+YI+SYFK NGI + I FE+GD+E+LA++TVDFLSF 
Sbjct: 241 RDIIiAAHAFENLOTjLFSDIHVRGKYPSYIKSYFKENGIEIVFEDGDKELLAEHTVDFLSF 300 

Query: 301 SYYMSVTQRYDFENYQSGQGJSni^LTNPHLTTSEnSIGWQIDPIGLRLVIlIQYYERYQIP 360 

SYYMSVTQR++ E Y SGQGNIllGGL+NP+L +SEWGWQIDPIGLRLVIUQYY+RYQIPL 
Sbjct: 301 SYYMSVTQaHNPEaYTSGOSNIIiGGLSNPYLESSEWGWQIDPIGIjRLVLNQYYDRYQIPri 360 

Query: 361 FIVENGLGRKDQLIETLDGDYTVEDDVRIDYMNQHLVQVAKAIEDGVEIMGYTSWQCIDC 420 

FIVENGLQRKDQL++T DG TV DDYRIDYM+QHLVQVAKAIEDGVE+MGYTSWaCIDC 
Sbjct: 361 FIVENGLGAKDQLVQTADGSMTVHDDYRIDYMSQHLVQVAKAIEDGVEVMGYTSWGCIDC 420 

' Query: 421 VSMSTAQLSKRYGLIYVDRNDDGTGSLQRYKKKSFGWYQKVIKTNGQSL 469 
VSMSTAQIiSKRYG lYVDRNDDGTG L RYKKKSF WY++VH-TNG+ L 
Sbjct: 421 VSMSTAQLSKRYGFIYVDRNDDGTGQLTRYKKKSFDWYRQVIQTNGRYL 469 

Based on tliis analysis, it was predicted tiiat these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2163 

A DNA sequence (GBSx2280) was identified in S.agalactiae <SEQ ID 6685> which encodes the amino 
25 acid sequence <SEQ ID 6686>. Analysis of this protein sequence reveals the following: 



Possible site 
»> Seems to 

INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



3 N-terminal signal sequence 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



247 - 253 ( 241 - 

429 - 445 ( 424 - 

285 - 301 ( 280 - 

207 - 223 ( 205 - 

113 - 129 ( 112 - 

309 - 325 ( 305 - 

395 - 411 ( 395 - 

174 - 190 ( 173 - 



Final Results 

bacterial membrane — Certainty=0. 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm — CertaintyisO . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:C3«^84286 GB:Z34526 beta-glucoside permease [Bacillus subtilis] 
Identities = 225/594 (37%) , Positives = 351/594 (58%) , Gaps = 11/594 (1%) 

45 







4 


YQETAKaiLAaVCffiEKNrQHVTHCVTRLRLVIJDNDEI\nroQVIKTIENVIGVMl^ 


63 








Y + +K IL VGGE+N+Q V HC+TRLR L ++ + ++ +P V+G +Q+Q 






Sbjct: 


3 


YDKLSKDILQLVGGEENVQRVIHC^ra^LRFNLHDlSIAKADRSQLEaLPGVMGTNISGEQPQ 


62 


50 


Query: 


64 


I ILGNDWNNYylIaFLALGHE™^TREFSSQKKSSILEKLIETIAGVITPLIPALLGGG^IL 


123 








II+GNDV Y A + + + SS +K ++L + + I+GV TP++PA+ G GM+ 






Sbjct: 


63 


IIIGmWPKVyQ&IVRHSNLSIJEKSAGSSSQKICNVLSAVFDVISGVFTPlLPAIAl^^ 


122 






124 


KVIGILLPiyn^IASSSSQTV2\FINFFGDaAYYFMPIMIAYSAASRFK\n:PVLAATVGGIL 


183 


55 






K + L G + SQ + GD A+YF+P+++A SAA +F P +AA + + 






Sbjct: 


123 


KGLVALAVTFGVJMAEKSQVHVILTAVGDGAFYPLPLLLAMSAARKFGSNPYVAAAIAAAI 


182 




Query: 


184 


LHPAFVTMVAEGKPLSLFGAPUTIASYGSSVIPILIMVFLMQYIERWINKIVPSVMKSFL 


243 








LHP ++ GKP+S G PVT A+Y S+VIPIL+ +++ Y+E+WI++ + +K + 




60 


Sbjct: 


183 


LHPDLTALLGAGKPISFIGLPVTAATYSSTVIPILLSIWIASYVEKWIDRFTHASLKLIV 


242 



Query: 244 QPTLIILISGFLALVWGPLGVIIGKGLSSAMLSIYHVAPWLALSILGAIMPLVVMTOVIH 303 
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Query: 304 WAFAPIFLAASV&TPDVLILPAMLaSNLAQCSftMLRVAVKAKQKQT 363 

+AF PI + +LPAM +N+ Q AS AV ++++ K+ + +A ++AL+ 

Sbjct: 303 YAPVPIMINNIAQNGHimiLPAMFIiraM(3Q2iGaSFAVFIiRSRinCK^ 361 

Query: 364 6ITEPALYGVTLKFKKPLYAAM1SGGLVGAYIGLWIASYTFWPSI1GLPQYINPQGGN 423 

GITEPA+YGV ++ KKP AA+1 G GA+ G+ +ASY +V GLP I G 
Sbjct: 362 GITEPA^WQV^ilMRLKKPFAAALIGGAAGGRFYG^ra3YASY--IVGGNAGLPS-lPVFlGP 418 

Query: 424 NFSNAVIAAIATIILTFIITWFLGIDEGEfqEKSSINAQEHTHIRSGLSKKETLYSPMVGN 483 

P A+I + + LG ++ ++ S Q H S +E ++SE+ G 
Sbjct: 419 TFIYAMIGI,VIAFAAETAAAYLLGFEDVPSEGSQ---QPAVHESS REIIHSPIKGE 471 

Query: 484 VLPLSKVPDETFSSKlLGEGLZaTPSVGEVYAPFDGEIISLFPTKHAIALKDDKGVEVLI 543 

V LS+V D FS+ ++G+G AI P GEV +P G + ++F TKHM + D+G E+LI 
Sbjct: 472 VKALSEWDGVFSAGW.GKGFAIEPEEGEWSPVRGSVTTIFKTKHAIGITSDQGJffilLl 531 

Query: 544 HIGIDTVEtNGEGFEQLVICVGDFVmGQIJiRMDIDPISSKGYSLISPVVVTNS 597 

HIG+DTV+L G+ F +K GD V G L+ D++ I + GY +I+PV+'rar+ 
Sbjct: 532 HIGLDTVKLEGQWFTRHIKEGDKWiPGDPLVSFDLEQIKAAGYDVITPVIVTNT 585 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2883> which encodes the amino acid 
sequence <SEQ ID 2884>. Analysis of this protein sequence reveals the following: 



Possible site: 20 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



N-terminal signal sequence 



Likelihood =-10 

Likeliliood = -6 

Likelihood = -4 

Likelihood = -3 

Likelihood =. -2 

Likelihood = -2 

Likelihood = -1 



Transmembrane 246 - 262 

Transmembrane 2 64 - 3 00 

Transmembrane 173 - 189 

Transmembrane 112 - 128 

Transmembrane 428 - 444 

Transmembrane 383 - 399 
308 - 324 



240 - 27i; 



111 - 137] 
425 - 445] 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- CertaintysQ. 5161 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < i 
■ Certainty«-0.0000(Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 508/619 (82%), Positives = 561/619 (90%), Gaps = 1/619 (0%) 
Query 

Sbjct: 



YQETAKAILAAVGGEKNIQHVTHCVTRLRLVLDHDEIVNDQVIKTIPNVIGVMRKNDQYQ 63 
YQEiaKAILAAV(3G+ NIQ VTHCVTRLRLVL NDE V DQ +K I NVIGVMRKN QYQ 
YQETAKAILAAVGGKTNIQRVTHCVTRLRLVLKiroEKVKDQQVKAISNVIGVMRK^ 62 



Query: 64 IILGNirasmYYlilRFLRLGHFElTOTREFSSQKKSSILEKLIETIAGVITPLIPALimjM^ 123 

IILOSIDVIINYY AFL+LGHP+N + SS+ K SILE+LISTIAGVITPLIPALLGGGML 
Sbjct: 63 IILG^lDVN^^fYQAPLSLGHFDNQDEDHSSKAKGSIIlERLIETIAGVITPLIPALLGQGML 122 

Query: 124 KVIGILLPMLGIASSSSQTVAPINPPGDAAYYFMPIMIAYSAASRFKVTPVLAATVGGIL 183 

KV+GILLPMLG+-AS+ SQTVAPrNFFGDAAYyFMP+MIAYSAA+RFKVTPVLflAT+ GIL 
Sbjct: 123 KWGILLPMLGLASADSQTVAFINFFGDftAYYFMPVMIAYSAAARFKVTPVLAATIAGIL 182 

Query: 184 LHPAFVTMVAEGKPLSLFGAPVTLASYGSSVIPILIMVFLMQYIERWINKIVPSVMKSFL 243 

LHPAFV MVAEGKPL+LFGAPW ASYaSSVIPIL+!^/+LMQYIE+W+N++VPSVMKSFL 
Sbjct: 183 LHPAFVAIWAEGKPLTLFGAPVTPASYGSSVIPIL^1IWYLMQYIEKWVNRLVPSVMKSFL 242 

Query: 244 QPTLIILISGFLALVWGPLGVIIGKBLSSAfffiSIYHVAPWLALSILGAIMPLVVMTGMH 303 

QPTLIILISGFLALVWGPLGVIIG+GLS+ ML+IYHVAPWLRL+ILGAIMPLWMTGMH 
Sbjct: 243 QPTLIILISGFLALWVGPLGVIIGQGLa^maiAIYHVAPWLALAILGAIMPLVVMTGMH 302 



Query: 304 WAFAPIFLaASVATPDVLILPAMLASNLRQG2iASLAVAVKAKQKQTRQVAFAAGLSALLA 363 
65 WAERPIFLAASVATPDVLILPAMLASNLAGGAASLAVA K KQKQTRQVA AAG4SaLLA 
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Sbjct: 303 WAFAPIFI^SVATPDVLILPimiaSlSniAQGAASLAVAFKTKQKQTRQVALAAGISALLA 362 

Query: 364 GITEPALYGVTLKFKKPLYAAMISGGLVGAYIGLVNIASYTFWPSIIGLPQYINPQGGN 423 

GITEPALYGVTLKFKKPLYAflMISGGLVGA+IG VNIASYTFWPSIIGLPQYINP GG 
Sbjct: 363 GITEPALYGVTLKFKKPLYAAMISGGLVGRFIGFVNIASYTFWPSIIGLPQYINPSGGA 422 

Query: 424 NFSNAVIftAIATIILTFIITWFLGIDEGENEKSSlNAQEHTHIRSGLSKKETLYSPMVGN 483 

WF+NA+ia ATI+L F +TWF+GIDE E+ K A + + ++SGLS K+TIiY+PM G 
Sbjat: 423 NFTmjIAGTATIVIAPSLTWMGIDE-ESPKQVSVAADMSQVKSGLSTKQTLYaPMraE 481 

Query: 484 VLPLSKVEDETFSSKLIfiEGI^ITPSTOBVYAPPDGEIISLFPTKHAIALKDDKBVEVLI 543 

+L LS+VPDETFSSKLLGEG AI PS GEVYAPFDGE+I+ FPTKHA+ALK+ +GVEVLI 
Sbjct: 482 MLFLSEVPDETFSSKIiLGEGFRILPSEGEVYAPFDQEVITFFPTKHAVaLKNTRQVEVLI 541 

Query: 544 HIGIDTVEENGEGFEQLVKVGDFVKRGQLLIJa©IDFISSKGYSLISPVVVTO 603 

H+6IDTVEL G+GFEQLV VGD VKRGQ LL+MDIDFI+SKGYSLISFVWnsrS +QrJEI 
Sbjct; 542 HVGIimTELKGQGFEQLVSVSDVVKRGQALLKmiDFITSKGYSLISPVVVTHSftEQm 601 

Query: 604 IVKDAETMVTNEDDLLVIL 622 

I++D + MVT ED LLVIL 
Sbjct: 602 IIQDDKKMVTKEDALLVIL 620 

Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 



Example 2164 

A DNA sequence (GBSx2281) was identified in S.agalactiae <SEQ ID 6687> which encodes the amino 
acid sequence <SEQ ID 6688>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1148 (Affirmative) < suco 

bacterial raeitibrane — Certaintys=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protem has homology with the following sequences in the GENPEPT database. 



Identities = 118/275 (42%) , Positives = 183/275 (65%) 







Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 






Sbjct: 


181 


Query: 


241 


Sbjct: 


241 



K++LGK r.++ +y++LTDHI+ AI+R+++G+ I+N L 



E +R Y DE++IG -rAL ++K++ G+ L DE+ FIA+H VNA L+ 



+Y+++I:. S ELLYLT+H++R+VK 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 6689> whicli encodes the amino acid 
sequence <SEQ ID 6690>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

- Pinal Results 

bacterial cytoplasm --- Certainty=0 .0680 (Affirmative) < suco 

bacterial metribrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 220/279 (78%) , Positives = 246/279 (87%) 



MIIiOiVIiNHNAVlSVTHQGLDVIiWGKGIAFKKRIGDRINSDAIEKSFVLKNSDNMNRFT 60 
M+IKRVLNHNA IS HQGLD+LIiMGKGI F K++GD I +AIE SFVLKNSDNMNRFT 
MLI KRVLNHNAAISTNHQGLDILLMGKGITFGKKVGDS lELNAIETSFVLKNSDNMNRFT 6 0 



Query: 121 EIQRYYPDEYSIGMKALELIKDELGICLTIDESAFIAMHFVlCWSrmPEtlEAHKITEIVS 180 

EIQRYYPDEYS+G+KAI£!LI+ LG+ L IDE+AFIAMHFVNR LD PF E H++TEIVS 
Sbjct: 121 EIQRYYPDEYSLGVKAIiELIERNLGVTIiAIDEAAFIMraFVNASLDTPFKEPHRLTEIVS 180 

Query: 181 YIEQKOTCIDFRTELDESSIDYYRFMTHTKLKACJRVLSGMKYEDDDftDLIiLVVKKKYPREY 240 
YIEQK+K DF+TELD++SIDYYRFMTH KLFAQRVLS M Y+DDDA+LLLWK KYP+EY 

Sbjct: 181 YlEQKIKTUt'KTEI.DDTSinYYRFMTHIKLFAQRVLSQMSYDDDDAELLLWKrKYPKEY 240 

Query: 241 KCVKEIGMNMAIQYQYQLNSSELLYLTVHVKRLVKNLKE 279 

+CV +1 + +Y Y LNSSELLYLTVHVKRLVK+LKE 
Sbjct: 241 RCVLDISEEIKKRYNYHHiNSSEIiliYLTVHVKRIjVKHLKE 279 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2165 

A DNA sequence (GBSx2282) was identified in S.agalactiae <SEQ ID 6691> which encodes the amino 
acid sequence <SEQ ID 6692>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1104 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9335> which encodes amino acid sequence <SEQ ID 9336> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6693> which encodes the amino acid 
50 sequence <SEQ ID 6694>. Analysis of this protem sequence reveals the foUowmg: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certalnty=0. 3314 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/178 (80%) , Positives = 161/178 (90%) 

5 Qaexy: 1 OTLHHDKHHRTYVMiaNAMiEKHPEIGEEIIiEMIiaiJVSQIPEDIRQaVIimGOT 60 

MTLHHDKHHATYVAN NftMiEKHPEIGE+IiE LLADV+H-IPEDIRQ +IMNGGGHIiNHAL 
Sbjct: 24 MTimDKHHATWMWNAALEKHPEIGEraSSIJIiRDVTKIPEDIROT^ 83 

Query: 61 JTOLMSPEETQISQELSEDINATJKMFEDFKMFTAAATGRPGSGKAWLVVNAEGKIiEVL 120 
10 FWEL+SPE4 ++ ++++ 1+ FeSF+ FK FTARATGRPGSGWRWLWH EG+LE+ 

Sbjct: 84 FWELLSPEKQDVTPDVAQAIDDAPaSFDAFKEQFTAaATGRPGSGWAWLWNKEGQLEIT 143 

Query: 121 STANQDTPIMEGKKPII^DVWEHAYYLNYRNVRElSrriKWFPEIINM^^ 178 
STANQDTPI EGKKPIL I£(\«ffiHAYyiJSYRNVRPNYIKAFPEI+lW KV+ELYQRAK 
15 Sbjct: 144 STANQDTPISEGKKPILRLimJEHAYYIiNYRNVRENYIKRFFEIVlWK^ 201 



Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2166 

20 A DNA sequence (GBSx2283) was identified in S.agdactiae <SEQ ID 6695> which encodes the amino 
acid sequence <SEQ ID 6696>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3331 (Affirmative) < suco 

bacterial membi-ane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified m S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2167 

35 A DNA sequence (GBSx2284) was identified in S.agdactiae <SEQ ID 6697> which encodes the amino 
acid sequence <SEQ ID 6698>. This protein is predicted to be DNA polymerase HI delta subunit. Analysis 
of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytcplasm Ctertainty=0 . 0511 (Affirmative) < suco 

bacterial membrane Certainty4=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

A related GBS nucleic acid sequence <SEQ ED 9743> which encodes amino acid sequence <SEQ ID 9744> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6699> which encodes the amino acid 
sequence <SEQ ID 6700>. Analysis of this protein sequence reveals the following: 



wo 02/34771 



PCT/GBOl/04789 



0 - 266 < 249 - 266) 

Final Results 

bacterial membrane Certainty=0 . 14S9 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suoo 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suoo 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/340 (65%) , Positives = 282/340 (82%) 

Query: 1 MIAIEEIGRITPDNLGLVTVLAGEDLGQYAQMKEKLFQVIGFNKDDLAYSYFDLSEEDYQ 60 

MIAIE+I +++ +NLGL+T++ G+D+GQY+Q+K +L + I F+KDDLRYSYFD+SE YQ 
Sbjct: 1 MIAIEKIEKLSKENLGLITIiVTGDDIGQYSQLKSRLMEQIAFDKDDLAYSYFDMSEAAYQ 60 

Query: 61 NMIiDLESI^FLSDYKWIFDQFQDITTDKKTYLDEQAMKRFEAYLQNFVDTTRLVICft.P 120 

+AE+DL SLPF ++ KWIPD DITT+KK++L E+ +K FEAYL+NP++TTRL+I AP 
Sbjct: 61 naEMDr.VSLPFFAEQKWIFDHLLDITTNKKSFLKEKDI.KAFEAYi:iE^ 120 

Query: 121 GKLDGKRRLVKXJiKEIfflRVLEaOTLKESDLKTYFQKyjUIQEGLVFERG^ 180 

GKLD KRHLVKLBKEDA VLERN LKE++I1+TYFQKY+HQ GL FE+G FD+LL+KSN D 
Sbjct: 121 GKLDSKRRLVKLLKRDALVLEANPLKEAELRTYFQKYSHQLGLGPESGAFDQIiLLKSNDD 180 

Query: 181 FSDTLTNIAFLKSYKTDGHISSNDVREAIPKSLQDNIFDLTQDVLLGRIDLARDLVRDLR 240 

FS + N+AFLK+YK G+IS D+ +AIPKSLQDNIFDLT+ VL G+ID ARDL+ DLR 
Sbjct: 181 FSQIMKNMAFLKAYKKTGHISLTDIEQAIPKSLQDNIFDLTRLVLGGKIDAARDLIHDLR 240 

Query: 241 LQGEDEIKLIAIMLGQFRMFLQVKILASKGKSESQIVSELSHYIGRKINPYQVKFAVRDS 300 

L GED4-IKLiaiMLGQFR+FLQ+ ILA K+K Q+V I,S +GR++NPYQVK+A++DS 
Sbjct: 241 LSGEDDIiajIAIMIX3QFRI.PLQLTIIARDVKIffi;QQLVISLSDrLGRRVNPyQVKKaLKDS 300 

Query: 301 EMLPLAEtKEAIRlLIETDXAlKRGTYDKDYLFDLALLKI 340 

R L lAFL A++ LIETDY IK G Y+K YL D+ALLKI 
Sbjct: 301 RTLSLAFLTGAVKTLIETDYQIRTGLYEKSYLVDIALLKI 340 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2168 ' 

A DNA sequence (GBSx2285) was identified in S.agalactiae <SEQ ID 6701 > which encodes the amino 
acid sequence <SEQ ID 6702>. Analysis of this protein sequence reveals the following: 

0 N-terminal signal 



Final Results 

bacterial cytoplasm Certainty=0. 3071 (affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefial antigens f 



Example 2169 

A DNA sequence (GBSx2286) was identified in S.agalactiae <SEQ ID 6703> which encodes the amino 
acid sequence <SEQ ID 6704>. This protein is predicted to be esterase. Analysis of this protein sequence 
55 reveals the following: 
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Possible site: 26 ' . 

»> Seems to, have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -0.32 Transmembrane 175 - 191 ( 175 - 191) 

5 Final Results 

bacterial membrane Certainty=0 . 1128 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

1 0 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaB17013 GB:L38252 esterase [Aoinetdbacter Iwoffii] 
Identities = 63/218 (28%) , Positives = 107/218 (48%) , Gaps = 3/218 (1%) 

Query: 105 KVIFYWGGSYIHQRSELQYIFVNKIJUCKIjmWFPIYPKAPTYOTStaiPKI 164 
15 ++IF++HGG++ + + LA + +V+ YP AP + Y +AI I +YQ 

Sbjct: 73 QLIFHIHGGAFFLGSIOTHRAUOTIAARTQMQVIHVDYPLAPEHPYPEAiraiFDVYC!^ 132 

Query: 165 TLASVTSPKQIILVGESAGGGLAIGIiaDNLVTEHIKQPKEIILISPWIDIATlTOKIE^ 224 
L PK 11+ G+S G tiRL L L + P +IIi+SP+i:iD+ + + 

20 Sbjct: 133 LLVQGIKPKDIIISGDSCGAlimAIALCIJUiKQQPELMPSGLILMSPYI^ 192 

Query: 225 QKKDPLLKMQLQQVAPYWANGKKNFKNPQVSPLYSSQENKMAPISFFIGTHDIFYPDNQ 284 

QK D LL LQ H-+ +P+VSPIi-l- + + P +G+ +1 D++ 

Sbjct: 193 QKHDALLSIEALQAGIKHYLTDDIQPGDPRVSPLF-DDLDGLPPTLVQVGSKEILLDDSK 251 

25 

Query: 285 LLHQKLAKENIKHHYIVGQKMNHVYPVLP- - IPEAETA 320 

+K + ++K H+ + M H + + PEA+ A 

Sbjct: 252 RFREKAEQADVKVHFKLYTGMWHNFQMFNAWFPEAKQA 289 

30 There is also homology to SEQ ID 3498. 

Based on fliis analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2170 

A DNA sequence (GBSx2287) was identified in S.agalactiae <SEQ ID 6705> which encodes the amino 
35 acid sequence <SEQ ID 6706>. This protein is predicted to be purine nucleotide synthesis repressor. 
Analysis of this protein sequence reveals the following: 

N-terminal signal sec[uence 

40 Final Results 

bacterial cytoplasm — - Certainty=0. 2970 (Affirmative) < suco 

bacterial tnenibrane Certainty=0 . 0000 (Not Clear) < auoo 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16124 GB:Z99124 similar to transcriptional regulator (Lad 
family) [Bacillus subtilis] 
Identities = 111/300 (37%) , Positives = 175/300 (58%) , Gaps = 4/300 (1%) 



Query: 61 KIGVVIPHTRHPYFTQLIKGLLDAAKTTDYQLVMMPSDYWQELELSYLKQIiKMEAinALI 120 

+GV++P++ HP F +++NG+ AA +Y ++P++YN ++E+ YL+ L+ + ID LI 
Sbjct: 61 TVGVlLPYSDHPCFDKIVNGITKAAPQHEYATTLLPTNYNPDIEIKYLELIiRTKKIDGLI 120 

Query: 121 FrSRAISLDIIETYAKyGRIVVCEKLQEYim.SSAYLDRYSSFLEAFSDMKLRGLEHLVL 180 
TSRA D I Y +YG ■!-+ CE + + + A+ DR +++ E+F +K RG E++ 
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Sbjct: 121 ITSRANHWDSIIAYQEYGPVIACEDTGDID-VPCAFNDRKTAYAESFRYLiKSRGHENIAF 179 

Query: 181 LFSRl^lffiSSATYC3SALI^YQEVYGQLSSPy^m■GlWHDFlroG-IiNLSYQLVKEVSIDGIL 239 

R + S + AY+ V G+L +M+ 6 +D NDG L + + I 

Sbjot: 180 TCVREaDRSPSTADKaaAYKAVCGRLEDEHmSG-CNDMNDGBI^ 238 

Query: 240 ATSDEVAaGLIKGYEESRKKCPYIICSQECLLVGQLI^PTIDHKSYYLGKLRFKQAIAEK 299 

A SDEVAAG I + + IIG+ + ++1. P++D LG AF L ++ 

Sbjct: 239 ANSDEVAM-IHLFAKKMMWDVEIIGEGNTSISRVLGFPSLDIfflLEQLGI^ 297 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2171 

A DNA sequence (GBSx2288) was identified in S.agalactiae <SEQ ID 6707> which encodes the amino 
acid sequence <SEQ ID 6708>. Analysis of this protein sequence reveals the following: 

0 N-terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0. 3451 (Affirmative) •; succ 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology wifli the following sequences in the GEMPEPT database. 



miKRIFCDMDGITjLNSEGQVSKSNATLIREAA---IPOTLVaARAPMEMKnAVD^ 57 
M K +F D +G'rLL S+ +S +1+ IP +SAR+P+ + L+ 

MMYKAVFSDFNG'ELLTSQHTISPRTWVIKRLTANGIPFVPISARSPLGILPYWKQLETN 60 

GVQVAKNGGLIYRIGDfraQ^/LPIHTQIIKKSTVKQLLRGIRFHFPQVSLSYYDLNNWYCD 117 
V VAF+G LI N + PI++ 1+ + ++ + H P + N+ + 

Sbjct: 61 NVLVAFSGALIL NQNLEPIYSVQIEPKDILEINTVLAEH-PLLGVNYYTNNDCHAR 115 



++I RS +LE+ H A K + ++ + E AFGD NDL MLE VG 



M MA ++IK A +T +N+EDG+ 
MGNAPNEIKQAANWTATNNEDQL 252 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 

vaccines or diagnostics. 

Example 2172 

A DNA sequence (GBSx2289) was identified in S.agalactiae <SEQ ID 6709> which encodes the amino 
acid sequence <SEQ ID 6710>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results ' 

bacterial cytoplasm Certainty=0 .2854 (Affirmative) < suco 





1 


Sbjct: 


1 






Sb j ct : 


61 




118 


Sbjct: 


116 




177 


Sbjct: 


169 




237 


Sbjct: 


229 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 



Example 2173 

A DNA sequence (GBSx2290) was identified in S.agalactiae <SEQ ID 6711> which encodes the amino 
10 acid sequence <SEQ ID 67I2>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have an imcleavable N-term signal seq 
Likelihood =- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood » 
Likelihood = -2 
Likelihood = -1 
Likelihood = -0 
Likelihood = -0 



INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



85 Transmembrane 247 - 



- Certainty=0. 5203 (Affirmative) < siicc: 

bacterial outside Certaintyi=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 



Sbjot: 1 



MLQLTKyPPLKPIYLMiLVFQIYIJliVFSWTMLGCAELLFSFIFLIYQYDRETIFKTIAIV 60 
MLQ K F + IYL+ L+ +Y +FS + L +F + L Q+ ++ K + I 

MLQWIKNFSIPLIYLSFLLLWLYYAIFSASYLALLGFVFLLVCLFIQPPWKSAGKVLIIC 60 



Query: 61 IFFLFYFLWQimimOTQYQRVPNHISQIKVRIDTISINGDVLSFQADASGNTYQAFYTLK 120 

P F+F++QN + Q + + + ++++ DT4- +NGD LSF+ AG +Q +Y L+ 
Sbjot: 61 GIFGFWFVFQNWQQSQASQNLADSVERVRILPDTVKVNGDSLSFRGKADGRIFQVYYKLQ 120 

Query: 121 NKSEKDYFQNLDNNIMIIADIiOjEEAEERRHFNGFDYRQYLKRHGIYRIAKVTKIKQIRL 180 

++ EK+ FQL+ I + KLEE +R+F GF+Y+ YLK GIY+ + K1+ ++ 
Sbjot: 121 SEEEKEAFQALTDLHEIGLEGKLSEPEGQRNFGGENYQAYLKTQGIYQTLNIKKIQSLQK 180 

Query: 181 FQHRSFFALMSKMRRSAIVISQT-FPNPMEHYMSGLLFGYLDKTFDDMSDLYSSLGIIHL 239 

+S RR A+V +T FP+PMR+YM+GLL G+LD F+H-M+H-LYSSLGUHL 
Sbjot: 181 IGSTOIGENLSSLRRKAVVWIKTHFPDPKKiranGLLLGHLDTDFEEMNELYSSLGIIHL 240 

Query: 240 FALSGMQVGFFLGIFRYICLRIGLRLDHVWLLQIPFSLIYAGLTGFSISWEALIQSLLS 299 

FALSGMQVGFF+ F+ + LR+GL + + L PFSLIYAGLTGFS SV+R+L+Q LL+ 
Sbjct: 241 FALSGMQVGFFMNGFKKLLLRLGLTQEKLKWLTYPFSLIYAGLTGFSASVIRSLLQKLLA 300 

Query: 300 HSGVKKDENFALCLLICLISLPHSLLTTGGVLSFAYAFILTMTSFDHFSSIKKVAIESLT 359 

GVK +N AL +L+ I +P+ T GGVLS AYAFILTM S + +K VA ESL 
Sbjct: 301 QHGVKGLDNCALTVLVLPIVMPNFFFTAGGVLSCAYAFILTMPSKEG-EGLKAVASESLV 359 
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Query: 
Sbjct 

Sbjcti 

Sbjct 

Sbji 

Sbjct 



360 ISLGILPILSFYFAEFQPWSILLTFVFSFLFDLTFLPLLSILFVLSFLYPVIQLNFIFEW 419 

420 LEVLLKWTGQLFPRPLIFGKPSLFLLIVMI I ILGLLYDYYHSKCFRYCSLLIIFTLFFIT 479 

LE +++ Q+ RPL+FG+P+ +LLI+++I L L+YD + L+I LF +T 

420 LEGIIRLVSQOTSRPLVFGQPNTOLLILLLISLALVYDLRKNIKKLTVLCLLITGLFLLT 479 

480 KNPITNEVAILDVG<K3DSILVRDWL6KTILIDTa3RW-FEQPEEWKQK™QSNAKKTLI 538 

K+P+ NE+ +LDVGQG+SI +RD GKTILID GG+ +++ ++W++K+ SNA+R+LI 
480 KHPLENEITMLDVGQGESIFLRnVTGKTILIDVGGKAESYKKIKKWQEKMTTSISRQRSLI 539 



599 AVKSIEAGDKIJWMGSYLQVIiYPWHKBDGKmrosiVLYGHLLGKGFLFTeDLEEEGEKQL 658 

V+S+ 6+ L + GS L+VL P GDG ++D++VLyG h K FLFTG+LEE+GEK L 

600 KVRSMIVGENLPIFGSQLEVLSPRKMGDGGHDDTLVLYGKFLDKQFLFTQNLEEKSBKDL 659 



A related DNA sequence was identified in S. pyogenes <SEQ ID 6713> which encodes the amino acid 
sequence <SEQ ID 6714>. Analysis of fliis protein sequence reveals the following: 



: 29 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



1 uncleavable N-term signal seq 



= -0, 



Transmeinbrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



394 - 410 



35S - 372 



325 - 341 



380 - 422) 



355 - 377; 



465 - 484 
325 - 347; 



441 - 458i 



■ Final Results -■ 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



- Certaintyi=0. 5076 (Affirmative) ■ 

- Certainty-0. 0000 (Not Clear) < 1 

- Certainty=O.COOO(Not Clear) «: 1 



The protein has homology with the following sequences in the databases: 



>GP:aAC23742 GB:AF052208 competence protein [Streptococci 
Identities = 311/706 (44%) , Positives = 458/706 (64%) , { 



Query: 


5 


Sbjct: 


4 


Query: 


62 


Sbjct: 


64 




122 


Sbjct: 


121 




182 


Sbjct: 


181 


(Juery: 


242 


Sbjct: 


241 



WTKLVPLSKIQEAFLILVFFYQIHSPSWLTFL-LSLSLICLLVKRLSKK- -EFLGVFAIL 61 
W K +1 +PL+L +Y I S S+L L L+CL ++ K + L + I 

WIKNFSIPLIYLSFLLLWLYYAIFSASYIjaLLGFVFI^VCIiFIQFPWKSftGKVLIICGIF 63 

SFCSiLFLLyQKQQLVQKIjEIQPVQITSVftLVPDSIRINGDQiLAVLGRHGKHSYQLFYRLK 121 
F +F +Q+ Q Q L + V ++PD++++NGD L+ G+ +Q++Y+L+ 

GFWFVFQ^ro(X3SQASQNIJffiS- --VERWILPDTVKVNGDSLSFRGKAD6RIFQVYYKLQ 120 



- E RNF GFNYQ +L QGIY+ L+ 



h LSSLRR+A+V 



HFP PM +Y+TGLL G+LD F EM + YS LGIIHL 



FALSGMQVGFF+ F+++Lr. L + E +KH+ PF+ YA LTG+S SVIRSL+Q L 
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Quenry: 302 HLGIK6LDNIACTFLLVFLWDAHFLMWGGVLTPSYAFLLTVVTVEELSGAKRQLVQVLT 361 

G+KSLDN A T L++F+ +F . T GGVL+ +YAF+LT+ + H-E G K + L 
Sbjct: 301 QHGVKGLDNCaLTVLVLFIVMPNFFFTAGGVLSCAyAFILTMPS-KEGEGLKAVASESLV 359 

Query: 362 ISLGILPFLLFYFSSFNPMSMVLTGLLSYLFDLFILPLLCLVFCLSPLVTVSICNHLFIL 421 

ISLGILP L FYF+ F P S++LT + S+LFDL LPLL ++F LS Li V N +F 
Sbjct: 360 ISl,GILPILSFYFAEFQPWSIi:.LTFVFSFLFDLTELPLLSILFVLSFLYPVIQtNFIFEW 419 

Query: 422 LEKVIQPI.GNTFNSSLVFGSPTSWHLLILVISFAIFYDYRQ-VRQRVITCGLVIALTLLS 480 

LE +1+ + + LVFG P +W L++L+IS A+ YD R+ + C L+ L LL+ 

Sbjct: 420 LEGIIRLVSQVTSRPLWGQENTWLLILLLISIiaLVYDIJlKNIKKI.TVM 479 

Query: 481 VKYPLTNEVTPIDIGQGDSILVREOTGKNLLIDVOSR-PFFSSKEHWRRGHHVaN^ 539 

K+PL HE+T +D+GQG+SI +R+ TGK +LIDVGG+ + + W+ +NAQ++L 
Sbjct: 480 -KHPLENEITMIiDVGQGESIFIiRDVTGKTILIDVGGKAESYKKIKKWQEKMTTSNAQRSL 538 

Query: 540 IPYLKSRGIHTIDQLLVTHADTDHMGDIEWAKAIRIKEILTSQGSLSHPSFVRRLRRLK 599 

IPYLKSRG+ IDQL++T+ D +H+GD+ + KA + EIL S+ SL FV L+ + 
Sbjct: 539 IPYLKSRGVAKIDQLILTNTDKEHVGDLSEMTKAFHVGEILVSKDSLKQKEFVAELQATQ 598 

Query: 600 CHTOVLAAGDQLPIMGSVLQVLYPWQLGDGKNNDSLVLYGRLLNRTFLFTGDLEKEGENE 659 

VR + G+ LPI GS Ij+VIi P ++GDG ++D+LVLYG+ L++ FLFTG4LE++GE + 
Sbjct: 599 TKWSMIVGEraiPIFGSQLEVLSPRKMGDGGHDDrLVLYGKFLDKQFLFTGNLEEKGEKD 658 

Query: 660 IIKRYPQLRVDYLKAGHHGSNTSSSAAFLDHIQPKVaFISAGKNNR 705 

++K YP L4-V+ LKA HG+ SSS AFL+ +4-P++ IS GK+NR 
Sbjct: 659 LLKHYPDLKVNVLKASQHGNKKSSSPAFLEKLKPELTLISVGKSNR 704 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 346/743 (46%), Positives = 491/743 (65%), Gaps = 3/743 (0%) 

Query: 5 TBCYFPIiKPIYLALLVFQIYLriVFSWTMLGCAFIjLFSFIFLIYQYDRETIFKTIArVrFFL 64 

TK PL I A L+ + + S' + L L L+ + ++ AI+ F 

Sbjct: 6 TKLVPLSKIQFAFLILVFFYQIHSPSWLTFLLSLSLICLLVKRLSKKEFLGVFAILSFCR 65 

Query: 65 FYFLWONHNMNVQYQRVEbmiSQIKVRIDTISINGD^ 124 

+ L+Q + + + P 1+ + + D+I INGD L+ ++YQ FY LK+++E 

Sbjct: 66 LFLLYQKQQLVQKLEIQFVQITSVALVPDSIRINGDQLAVLGRHGKHSYQLFYRLKSQAE 125 

Query: 125 KDYFQNLDNNIMIIADIKLEEBEBRRHFNGFDYRQYLKRHGIYRIAKVTKIKQIRIiFQHR 184 . 

F+ +++ A + LE+AEE R+F 6P+Y+ +L GIYRI KV +I+Q+ + 

Sbjct: 126 AQLFKKEHRWLVMHAKVTLEKBEEVRNFKGFNYQTFLTYQQIYRIGKVEQIEQLEVISPE 185 

Query: 185 SFFAIMSKWRRSaiV-lSQTPENPIiJRHYMSGLLFGYI.DKTFDDMSDLySSLGIIHIiFALS 243 

S +S RR AIV Q FP PM HY++GLLFGYLDK+F +M+D YS LGIIHLFALS 
Sbjct: 186 SICDYLSSDRRRAIVHCQQHFPRPMSHYLTGIiLFGYLDKSFGEMTDYYSQLGIIHIjFALS 245 

Query: 244 GMQVGFFLGIFRYICLRIGIJUjDHVWLLQIPFSLIYAGLTGFSISVVRALIQSLLSHSGV 303 

GMQVGFFL FR + L + + L+ + +++PF+ YA LTG+SISV+R+L+QS L.H G+ 
Sbjct: 246 GMQVGFFLTCFRRVLLLIiAVPIiEWIKWIELPFACFYAALTGYSISVIRSLVQSQLRHIiGI 305 

Query: 304 KKDENFALCLLICLISLPHSimTGGVLSFAYAFILTMTSFDHFSSIKKVAIESLTVSVG 363 

K +N A L+ + H L+T GGVL+F+YAF+LT+ + + S K+ ++ LT+S+G 
Sbjct: 306 KGLDOT^CTFLLyFLVTOAHFIMTVGGVLTFSYAFLLTVTOTEELSGSUaiQLVQVLTISIfi 365 

Query: 364 ILPILTYyFSGFQPISIILTALLSFAFDIIFLFLLT\riF5/LSPIVKLSCINSLFEILEVL 423 

ILP L +YFS F P+S++LT LLS+ FD+ LFLL ++F LSP+V +S N LF +LE + 
Sbjct: 366 ILPPLLFYFSSFNPMSMVLTGLLSYLFDLFILPLLCLVFCLSPLVTVSICNHLFILLEKV 425 

Query: 424 LKWT6QLFPRPLIFGKPSLFLLIVMIIILGLLYDYYHSKC-FRYCSLLIIFTLFFITKNP 482 

+++ G F L+FG P+ + L++++I + YDY + C L+I TL + K P 

Sbjct: 426 IQFI^OT™SSLVFGSPTSViHLIiILVISFAIFYDYRQVRQRVITOGLVIALTLLSV-KYP 484 

Query: 483 ITNEVAIIOTGQGDSILVRDWISKTILIDTGGRVRFEQPEEWKQKWNQSNAKRTLIPYLK 542 

+TNEV +D+GQGDSILVR+W GK +LID GGR F E W++ + +NA++TLIPYLK 
Sbjct: 485 LTIffiOTFIDIGQGDSILVREWTCKNLLIDVGGRPFFSSKEHWRRGHHVANAQKTL 544 
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Query: 543 SRGISKIDDLVITHTDTDHMGDMEVISKHFKWiRIilTSSGSLTNSQYVKHLSKIGVAVKS 602 

SRGI ID L++TH DTDHMGD+EV++K ++ ++TS GSL++ +V+ L ++ V+ 
Sbjot: 545 SRGIHTIDQLLVTHADTDHMGDIEWAKAIRIKEILTSQGSLSHPSFVRRLERLKCHWV 604 

Query: 503 lEAGDKLAVMGSYLQVIiYPWHKGDGKNNDSIVLYGHLLGKGFLFTGDLEEEGEKQLIiEAY 662 

+ AGD+L +MGS LQVLYPW GDGKNNDS+VLYG LIi + FLFTGDIE+KGE ++++ Y 
Sbjct: 605 LAAGDQLPI^CSVLQVLYPWQLGDGKI)OTSI>VLYGRLLNRTFLFTGDLEKEGENEIIKRY 664 

Query: 663 EIttSVDILKAGHHGSKBSSSLSFLKKLSPSVVLVSAGKNmYC2HPHQETI^FQKIKSKI 722 

P L VD LKAGHHGS SSS +Fh + P V +SAGKMIIRYQHPH4-ETL R + + 
Sbjct: 665 PQLRVDYLKMHHGSlTOSSaaAFI^HIQPKVaFISAGKNimYQHPHHETLJU^ 724 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or d 



Example 2174 

A DNA sequence (GBSx2291) was identified in S.agalactiae <SEQ ID 6715> which encodes the amino 
acid sequence <SEQ ID 6716>. This protein is predicted to be competence protein (comEA). Analysis of 
this protein sequence reveals the foUowmg: 

Possible site: 38 

»> Seems to have an uncleavable N-term signal seg 

INTESRAL Likelihood = -3.77 Transmembrane 18 - 34 ( 14 - 36) 

Final Results 

bacterial membrane Certainty=0. 2508 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in tlie GENPEPT database. 

>GP:AAC23741 GB:AF052208 competence protein [Streptococciis pneumoniae] 
Identities = 96/217 (44%) , Positives = 138/217 (63%) , Gaps = 4/217 (1%) 

Query: 3 EIVLEKIKSHKWETTGIIVGLLDFGILGUraFG-THHKEDNMINI^K-KVSTITEKKV 60 

E ++EKIK +K +GIiL+ G L T KE HL + ++EK+V 

Sbjct: 2 EAIIEKIKEYKIIVICTGLGLLVGGFFLLKPAPQTPVKETNLQAEVaAVSKDLVSEKEVN 61 

Query: 61 MISHVKDKVSNQVTVDVKGAViraPGVYSLPSQSRVTDAIKRAGGLSIffiADSKSVNLAQKL 120 

+ + +TVDVKGAV PG+Y LP SR+ DA+++AGGL+ ADSKS+nLAQK+ 
Sbjct: 62 KEEKEEPLEQDLlTVDVKGAVKSPGIYDIiPVGSRINDAVQKAGGLTEQftDSKSUJLAQECV 121 

Query: 121 QDETVIYVAQKGEKITVVEEEKANNIATQGNSKGKINLNKADLSSLQTISGVGAKRAQDI 180 

DE ++YV KGE+ V 4-+ A+ + + K+NLNKA L L+ + G+G KRAQDI 

Sbjct: 122 SDEaLVYVPTKGEE--AVSQQTGIiGTASSISKEKKVimiKaSLEEtiKQVKG]:XMK^ 179 

Query: 181 LDYRDSQGGFKTIDDLKHVSGIGEKTLEKLRQDVTID 217 
+D+R++ G FK++D+LK VSGIG KT+EKL+ VT+D 

Sbjct: 180 IDHREANGKFKSVDELKKVSGIGGKTIEKLKDYVTVD 216 

A related DNA sequence was identified in S. pyogenes <SEQ ID 6717> which encodes the amino acid 
sequence <SEQ ID 6718>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL, Likelihood = -9.61 Transmembrane 22 - 38 ( 16 - 42) 



Final Results 
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bacterial membrane Certair.ty=0. 4843 (Affirmative) • 

bacterial outside CertaintY=0 . OOOOXNot Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 

5 The protein has homology with the following sequences in the databases: 



Query: ■ 

W Q++ A + +++ K+ EEK+E E I VD+KGAV+ G+Y L SR+ D + 
Sbjct: 42 NLQAEVAAVS-KDLVSEKEVNKEEKEEPLEQDLITVDVKGAVKSPGIYDLPVGSRINDAV 100 



Qaery: 102 ElAGGLTSEADKHAINFAEKLTDEQWYVPKQGEEISVLPRSIiVSGKKETASKDQSKVHI 161 

+ AGGLT +AD ++N A+K++DE +VYVP +GEE + + G + SK++ KV++ 
Sbjct! 101 QKAGStlTEQADSKSUm^QKVSDEALVYVPTKBEE--AVSQQTGI^TASSISKEK-K^n^^ 157 

(3uery: 162 NKASI.EEI,QHIPGIGRKRa(3DIinMRDKLGGFKM^LRQV-SG 220 

NKASLEEL+ + G+G KRAQDIID R4- G FK++++L++VSGIG KT+EKLKD + +D 
Sbjct: 158 NKaSLEELKQVKGLGGKRAQDIIDHREANGKFKfiVDELKKVSGIGGKTIEKLKDYVT^ 216 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/166 (48%) , Positives = 111/166 (66%) , Gaps = 10/166 (6%) 

CJuery: 62 ISHVKDKVSNQ VTVDVKGAViraPGVYSLPSQSRVTDaiKRAGGLSNLaDSK 112 

IS VK +VS + + VD+BOSAV GVY L + SRV D 1+ AGGri-)-+ RD 

Sbjct: 55 ISPVKQQVSEEKKEIQEDSSILVDLKGAVQKBGVYKLTASSRVRDVIEIAGGLTSEA^ 114 

(Juery: 113 SVNIJ^KLQDETVIYVAQKGEKITVVEBEKaiSINIA-TQGKfSKGKINIinC^ 171 

++N A+KL DE V+YV ++GE+I+V+ + T + K+++NKA L LQ I G 

Sbjct: 115 AlNFAEKLTDEQWYVPKQGEEISVLPRSLVSGKKETASKDCJSKVHINKASLEElliGHIPG 174 

Query: 172 VGAKRAQDILDYRDSQGGFKTIDDLKNVBGIGEKTLEKLRQDVTID 217 

+GAKRAQDI+D RD GGFK ++DL+ VSGIGERrLEKL+ D+ +D 
Sbjct: 175 IGAKRAQDIIDMRDKLGGFKaLEDLRiJVSGIGEKiriEKLKDDIFLD 220 



A related GBS gene <SEQ ID 8989> and protein <SEQ ID 8990> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 5.70 
40 GvH: Signal Score (-7.5): -2.58 

Possible site: 38 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -3.77 threshold: 0.0 

INTEGEAD liikellhood = -3.77 Transmembrane 18 - 34 ( 14 - 36) 
45 PERIPHERAL Likelihood = 10.40 73 

modified ALOM score: 1.25 



* Reasoning Step: 3 

— Final Results — 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



■ Certaijity=0. 2508 (Affirmative) < succ; 
• Certainty=0. 0000 (Not Clear) < suoo 

■ Certainty=0. 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the databases: 

44.3/64.1% over 215aa 

Streptococcus 

pneumoniae 

GP 13211753 I competence protein Insert characterized 

60 

ORF01930(304 - 951 of 1014) 

GP|32117S3|gb|AAC23741.l| |AF052208(1 - 216 of 216) conpetence protein {Streptococcus 
pneumoniae} 
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%Match =25.0 

%Identity =44.2 %Similarity = 64.1 • 
• Matches = 96 Mismatches = 75 Conservative S-ub.s = 43 

5 90 ■ 120 150 180 210 240 270 300 

DDGKKIjNPLTYiYRLEIiaiIAIVIiLV]:iTLIFSYLASFVMDPQKHLK*GLHGNYIjLPSK*FFW^ 



330 360 390 417 447 474 504 534 

MFE I VLEKI KSHKWETTGI I VGLLLFGILGIiNHFG - THHKEDHIiNINLEK- KVSTITEKKVPMISHVKDKVSNQVTVDVK 
10 I :1 =111= 1=1 I II II = :=l|:| = = =11111 

MEailEKIKEYKirVaCTGLGLLVGGFFLLKPAPQTPVKETNLQAEWiAVSKDLVSEKEWKEEraEPL^ 
10 20 30 40 50 60 70 



564 594 624 654 684 714 744 774 

1 5 GAVNHPGVYSLPSQSROTDAIKEAGGLSNIiADSKSVOTj2VQKLQDEWIYTOQKGEKITVVXEEK^^ 

III II II: lh::|ll|: llll|:|lll|: II = = ll llh I 1= = = hll 

GaVKSPGIYDLPVGSRIMDAVQKAGGLTEQaDSKSIjKn:iAQKVSDEALVYVPTKGEE--AVSQQTC3IfiTASS 

90 100 110 120 130 140 150 



804 834 864 894 924 954 984 1014 

NKflDLSSLQTISGVGAKRAQDILDYRDSQGGFKTIDDLKOTSGIGEKTLEKl)RQDOTID*VFSSKTYLFSIVGLPt^ 
III I h : hi lllll|:|:|:: I W ■ ■ \ : W Mill 11 = 111= 11 = 1 
NKASLEELKQVKGLGGKRAQDIIDHREIANGKFKSVDELKKVSGIGGKTIEKLKDYVTVD 
170 180 190 200 210 

SEQ ID 8990 (GBS129) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 41 (lane 4; MW 43.8kDa), 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2175 

30 A DNA sequence (GBSx2292) was identified in S.agalactiae <SEQ ID 6719> which encodes the amino 
acid sequence <SEQ ID 6720>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>? Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-14.01 Transmembrane 215 - 231 ( 208 - 240) 

35 

Final Results 

bacterial membrane CertaintY=0. 6604 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB12793 GB:Z99109 similar to l-acylglycerol-3-phosphate 
O-acyltransferase [Bacillus subtilis] 
Identities = 66/200 (33%) , Positives = 111/200 (55%) , Gaps = 10/200 (5%) 

45 



Query: 


3 


YTYLRTLVMFLIWVANGNTUrraNEDKMLroDENYILVRPHRTFWDPV^^ 62 






y+ ++++G Y+E+LD+++ H+D + + PQ + 


Sbjct: 


2 


YKFCANALKVILSLRGGVKrarKEN- -LPADSGFVIACTHSGWVDVITLGVGlLPYQIHy 59 




63 


MAKKELFTNRLPGWWIKMCGAFPIDREKPGQIMRYPVKMLKNSNRSLVMFPSGSEHSKD 122 






MAKKELF N+ G ++K AFP+DRE PG +1+ P+K+LK + +FPSG4-R S+D 


Sbjct: 


60 


MAKKELFQNtCWIGSFIiKKIHAFPVDRENPGPSSIKTPIKLLK-EGEIVGIFPSGTRTSED 118 


Query: 


123 


V- -KGGVAVIAKMAKVRIMPAAYRGPMVPKNLLKGHRVDMNFGNPIDVSDIKRMDA-EGI 179 






V K G lA+M K ++PAAY+GP KLK+++GP++D + + E + 


Sbjct: 


119 


VPLKRGAVTIAQMGKAPLVPAAYQGPSSGKELFKKGKMKLIIGEPLH!^ffiFAHLPSKERL 178 


Query: 


180 


A EVSRRIQEEFDRLDR 195 






A 4-++RI+E ++LD+ 


Sbjct: 




AAMTEALNQRIKELENKLDQ 198 
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A related DNA sequence was identified in S.pyogenes <SEQ E) 6721> which encodes the amino acid 
sequence <SEQ ID 6722>. Analysis of this protein sequence reveals the followifig: 

Possible site: 49 
5 >i> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-11. 83 Transmembrane 241 - 257 ( 234 - 266) 
INTEGRAL Likelihood = -4.41 Transmembrane 27 - 43 ( 26 - 44) 

Final Results 

10 bacterial membrane CertaintY=0 .5734 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — CertaintY=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



>GP:CAI 




33 GB: 299109 similar to l-acylglycarDl-3 -phosphate 






0-acyltransferase [Bacillus subtilis] 


Identitie! 


3 = 59/198 (29%) , Positives = 104/198 (51%) , Gaps = 6/198 (3%) 


Query: 


29 


YAYLRGLWFLLimHGMAHYHHEEKMI£aSENYILVaPHRTFWDPVYMAFAaRP^^ 88 






Y+ ++L+G Y+E LA +++ H+D + + PQ + 


Sbjct: 


2 


YKFCTmLKVILSIiRGGVKVSNKEN- -IiPADSGFVIACTHSGWVDVITLGVGILPYQIHY 59 




B9 


MAKKELFANRLFAWWIKMCG!iPPIDRDKPSPnAIRYPVlM,KKSNRS 14 8 






MAKKBLF N+ +-HK aPP+DR+ P P +1+ P+ +LK+ + +FPSG+R S++ 


Sbjct: 


60 


MAKKELFQNKWIGSFLXKIHftFPVDRENPGPSSIKTPIKLLKE-GEIVGlFPSGTRT^^ 118 


Query: 


149 


V- -KGGVRVIAKLAKVKIMPAAYQGPMSVKBLLAGBRVDMTFGNPIDVSDIKRM-NDEGI 205 






V K G IA++ K ++PAAYQGP SKL +++GP++D •H + E + 


Sbjct: 


119 


VPLK3?GAVTIflQMGKAPLVPJ«lYQGPSSGKELFKKiGKMKLIIGEPLHQADFAHLPSKH?L 178 




206 


AEV2iHRIQaEFDRIDDEL 223 






A + + ++++L 


Sbjct: 


179 


AAMTEALNQRIKELENKL 196 



35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 185/244 (76%) , Positives = 212/244 (86%) 

Query: 1 MFYTYLRTLVMFLIWVANGNAHYHNEDKMLKDDENYILVAPHRTPWDPVYMAPAARPKQF SO 
+FY YLR LV+FL+WV NGNAHYH+B+KML ENYILVAPHRTFWDPVYMAFAARPKQF 
40 Sbjct: 27 VFYAYLRGLWFLLWVVNGNAHYHHEEKMLDASENYILVAPHRTFWDPVYMAFAARPKQF 86 



Query: 61 IFMAKKELFTNRLFGWWIKMCGAFPIDREKPGQDAIRYPVKMLJO^SNRSLVMFPSGSRHS 120 

IFMAKKELF NRLF WWIKMCGAFPIDR+KP DAIRYPV MLK SNRSL+MFPSGSRHS 
Sbjct: 87 IFMAKKELFANRLFAITOIKMCGRFPIDRDKPSPDAIRYPVISIMLKKSNRSLLMPPSGSRHS 146 

45 

Query: 121 KDVKGGUAVIAKMAKVRIMPAAYRGPMVFKm.LKEHRVDMNFGISIPIDVSDIK^^ 180 

++VKGGVAVIAK+AK7+IMPAAY+GPM K LL G RVDM FGNPIDVSDIKRM+ EGIA 
Sbjct: 147 QEWGGmVIAKLAKVKIMPAAYQGPMSVKGLLAGERVDMTFGNPIDVSDIKRMNDEGIA 206 

50 Query: 181 EVSRRIQEEFDRLDRENETYnDGKKLNPLTYIYRLPLAIIAIVLLVLTLIFSYLASFVWD 240 

EV+ RIQ EFDR+D E + GK NPLTY+YRLPL ++ +V+L+LT++FSY+ASFVW+ 
Sbjct: 207 EVANRIQAEFDRIDDELAPFQPGKaRNPLTYLYRLPLGLVLWVLLLTMLFSYIASFVWN 266 

Query: 241 PQKH 244 
55 P KH 

Sbjct: 267 PDKH 270 

SEQ ID 6720 (GBS171) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 36 (lane 2; MW 25kDa). It was also expressed in E.coli as a GST-fusion 
60 product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 3; MW 49.8kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 

Example2176 

A DNA sequence (GBSx2293) was identified in S.agalactiae <SEQ ID 6723> which encodes the amino 
5 acid sequence <SEQ ID 6724>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N- terminal signal sequence 

Pinal Results 

10 bacterial cytoplasm Certainty=0. 3268 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB11810 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 113/244 (46%), Positives = 173/244 (70%), Gaps = 2/244 (0%) 



LKENERIDQLPSTDVKIIQNKEVFSYSIDSVLLSRFPKLP-SRGLIVDLCSGNGaVGLFA 64 
L ++ER+D L + D+KIIQ+ VF++S+D+VLI1S+F +P +G IVDLC+GNG V L 
IJIDDERLDYLLAEDiyiKIIQSPWFAFSUaAVLLSKFAYVPIQKGKIVDLCTGNBIVPLLL 63 



125 PYFKASETSKKNLSPHYLLflRHEITTWLREICQIAQHaLKTKGRIAWVHRPDRFLEIlDT 134 

PYFK 4 +++N++ H +ARHEI L ++ ++ LK G+ A+VHRP R LEI + 
124 PYFKTPKQTEQNMNEHLRIARHEIHCTLEDVISVSSKLLKQGGKAALVHRPGRLLEIFEL 183 

185 MRQFNLAPKRIQFVYPKLGKDAIMLLIEAIKDGSTEGMKILPPLWHQDHGDYTETIFDI 244 

M+ + + PKR+QFVYPK GK+AN IK G + +KILPPL V+ + +YT+ I I 

184 MKAYQIEPKEVQFVYPKQGKEaNTILVEQIEGGRPD-LKILPPLFVYDEGNEyTKEIRTI 242 

245 YPGE 248 
+G+ 

243 LYGD 246 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6725> which encodes the amino acid 
sequence <SEQ ID 6726>. Analysis of this protein sequence reveals the following: 



Possible site: 



J-terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty!=0. 2183 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/257 (77%) , Positives = 228/257 (87%) , Gaps = 3/257 (1%) 

Query: 1 MIDTILKElffiRIDQLFSTDVKIICSNKEVFSYSIDSVIiSRPPKLPSRGLIVDLCSGNGAV 50 

MI ILKE ERIDQLFS+DV II(2NK+VFSYSIDSVLLSRFPK+PS+6LIVnLCSGNGAV 
Sbjct: 1 MIKRILKEGERIDQLFSSDVGIIQNKDVFSYSIDSVIiLSRFPKMPSKHLIVDLCSGNGAV 60 

Query: 61 GDPASTEnaTIIEIELQESLADMAKRSIKIiNKIiEKQVTMINDDLK^ 120 

GLFAST+T A I+E+ELQE lADM +RSI+I1N+LE QVTMI DDLKNLL+HV RS VDLM 
Sbjct: 61 GLPASTRTKAAIVEVEI^ERLADMGQRSIQUlQLEDQVTMICDDLKm.IiNHVPRSCSVDLM 120 



Query: 121 LGNPPYFKASETSKKNLSPHYIiIiUfflEITTraiREICQIAQHALKTK^^ 180 
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LCNPPYFK+ E+SKKN+S HYLLAEHE+nm EICQ+A+HALK+ GR+AMVHRPDRFIiE 
Sbjct: 121 LCNPPyPKSHESSKKNVSEHyLLaEHEVTTKn^EICQVARHALKSNGRI^ 180 

Query: 181 IIiyimQBTOAPKRIQFVyPia:jGKDANMLIiIERIKDGSTEGMKrr.PPIiVVHQDNGDy 240 
5 IID++R lAPKR+QFVYPKLGK AHMLLIEaiKDGS EGM ILPPLVVH++NG+YT+ 

Sbjct: 181 IIDSLRiaiGLaPiaiVQFVYPmSKSMIMLIjIEAIKDGSIEGmiLPPLVVHKEN 240 

Query: 241 IFDIYFGENGK---SHD 254 
IP+IYFG K +HD 
10 Sbjct: 241 IFEIYFGRRSKGKPNHD 257 

Based on this analysis, it was predicted tiiat these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2177 

15 A DNA sequence (GBSx2294) was identified in S.agalactiae <SEQ ID 6727> which encodes the amino 
acid sequence <SEQ ID 6728>. Analysis of this protein sequence reveals the following: 

Possible Bite: 55 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certaintyi=0 . 1512 (Affirmative) < suco 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 
bacterial outside Certainty^O . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB11811 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 40/82 (48%) , Positives = 63/82 (76%) 

Query: 7 YMYVLECSDGTLYTGYTTDVKRKiaramQKGaKYTIU^PVKIiYSEAENSKQ 66 
30 + YV++C D + Y GYT D+ +R+ THN GKGRKYTH- R PV+L+++E+F++K+EJM1+M1 

Sbjct: 7 FFYVVKCKDNSWYAGYTNDI^KRVia'HHDGKGAKYTKVHRPVELIFflESFSTKREAMQaE 66 

Query: 67 ALFKQKTRQAiOjTYIKQHKNEQ 88 
FK+ TR+ K YI++ +N + 
35 Sbjct: 67 YYFKKLTRKKKELYIEEKRNSK 88 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6729> which encodes the amino acid 
sequence <SEQ ID 6730>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 61 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1838 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/84 (71%) , Positives = 67/84 (79%) , Gaps = 1/84 (1%) 

50 Query: 6 AyMm.ECSDGTLYTGYTTDVia«IjNTHISrroK6AKYTRARLPVKLLYSEAF^ 65 

AYMYVLEC D TLYT6YTTD+K+RL THN GKGAKYTR RLPV LLY E F+SK+ AM A 
Sbjct: 6 AYMYVLECVDKTLYTSYTTDLKKRLATHNAGKaAKYTRYRLFVSLLYYEVFDSKE^ 65 

Query: 66 ERLF-KQKrRQAKLTYIKQHKNEQ 88 
55 BALF K+KTR KL YI H+ E+ 

Sbjct: 66 EftLFKKRKTRSQKLAYIATHQKEK 89 
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Based on this analysis, it was predicted that these proteins and their epitopes could be nseful antigens for 
vaccines or diagnostics. 

Example 2178 

A DNA sequence (GBSx2295) was identified in S.agdactiae <SEQ ID 6731> which encodes the amino 
5 acid sequence <SEQ ID 6732>. This protein is predicted to be autoaggregation-mediating protein (deaD). 
Analysis of this protein sequence reveals the following: 

Possible site: 56 

s.» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm — Certainty=0. 2287 (Affirmative) < buco 
bacterial membrane --- Certaintyi=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD20136 6B:AF091502 autoaggregatian-mediating protein 
[Lactobacillus reuteri] 
Identities = 289/504 (57%) , Positives = 366/504 (72%) , Gaps = 18/504 (3%) 

20 Query: 1 MKFTEaaJLSQDILSRVEKftGFVEPSPIQEMTIPLRLEGKDVIGQ^ 60 

MKF+EL LS +L A++++G+ E +PIQE TIP+ LEGKDVIGQAQTGT6KTARPGLP + 
Sbjct: 1 MKFSELGLSDSLLKAIKRSGYEEATPIQEtJTIPMVLEGKDVIGQAQTGTGKTARPGLPII 60 

Query: 61 WKIHTEDNTIQALIIAPTRELAVQSQEELFRRSRDKBVKVRSVYGGSS IEKQIKALRSGA 120 
25 + TE+ IQA+II+PTRELA+Q+QEEL+R G+DK V+Vi VYGG+ I +QIK+L+ 

Sbjct: 61 ENVDTENPNIQRIIISPTREIAIQTQEELYRLGKDKHVRVQWYGGADIRRQIKSLKQHP 120 

Query: 121 HVWGTPGRLLDLIKRKALKLNHIETLILDEWDEMLNMGFLEDIEAIISRVPETRQTLLF 180 
++VGTPGR1, D I R +KL+HI+TL+LDEADBMLNMGFLEDIE+II P+ RQTLLF 
30 Sbjct: 121 QILVGTPGRLRDHINRHTVKLDHIKTLVLDEADEMMMGFLEDIESIIKETPDDRQTLLF 180 



Query: 181 SRTMPDPIKRIGVKFMKDPEHVKIKATELTIWIWDQYTTOVKENEKFDTMTRIMJTO^ 240 

SATMP liCRlGV+FM DPE V+IKA. ELT VDQTrVR ++ EKFD MTRD+DV P+ 
Sbjct: 181 SATMPPEIKRIGVQFMSDPETVRIKAKELTTDLVDQYYVRARDYEKFDIMTRLIDVQDPD 240 

Query: 241 LSIVFQRTKRRVDELTRGLKLRGFEAEGIHGDLDQtnaUiRVIRDFKNDHIDILVATDVAA 300 

LH-IVFGRTKRRVDBL++GL R6+ A GIHGDL Q+KR +++ FKN+ +DILVAT0VAA 
Sbjct: 241 LTIVPGRTKRRVDELSKGLIflSGXliaAGIHGDLTQDKRSKIMWKFKNMELD 300 

Query: 301 RGLDISGVTHVYNTOIPQDPESYVHRIGRTGRAGKSGQSITFVSENEMGyLTIIENLTKK 360 

RGLDISGVTHVYNYDIP DP+SYVHRIGRTGRAG G S+TFV+PNEM YL IE LT4 
Sbjct: 301 RGLDISGVTHVYNYDIPSDPDSYVHRIGRTGRAGHHGVSLTFVTPNEMDyLHEIEKLTRV 360 





361 


RMTGMKPATASEAFQAKKKVALKRIARDFED-QELVSK--FDKFKADALELATQYTPEEL 


417 






RM +KP TA EAF+ ++A F D EL+++ D+++ A +L + +L 




Sbjct: 


361 


RMLPLKPPTAEEAFKG QVASAFNDIDELIAQDSTDRYEEAAEKLLETHNATDL 


413 


Query: 


418 


AL'Sn7LSLTVQDPESLPEVEITREKPLPFKPSGGGFKGKGGRGNGRGGD--RRRNDRGDRR 


475 






+L+ ++ S V+IT E+PLP + G R N GG+ RR+N R + 




Sbjct: 


414 


VRALIiNMMTKEAASEVPVKITPERPLPRRNKRN- -HRNGNRNNSHGGNHYRRKNFRRHQH 


471 


Query: 


476 


GNRDRDDRG SRCDFKRRDDK 495 








G+ D+ G SR F R K 




Sbj ct: 


472 


GSHRNDNHGKSHSSRHSFNIRHRK 495 





55 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6733> which encodes the amino acid 
sequence <SEQ ID 6734>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 56 

»> Seems to have no N-tertninal signal sequence 

60 
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Final Results. 

bacterial cytoplasm Certainty=0. 1108 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An aligranent of the GAS and GBS proteins is shown below. 

Identities = 430/545 (78%) , Positives = 463/545 (84%) , Gaps = 24/545 (4%) 

Query: 1 MKFTEIiNLSQDILSAVEKAGFVEPSPlQEMTIPLALEGKDVIGQAQTGTGK^^ 60 

+KFTE NIiSQDI SAV AGF + SPIQEMTIPLALEGKDV1GQAQTGTGKTAAFGI>PTL 
Sbjct: 1 LKFTEFNLSQDIQSAWTAGFEKaSPIQEMTIPLftLEGKDVIGQAQTGTGKTAAFGLPTL 60 

Query: 61 NKIHTEDNTIQALIIAPTRELAVQSQEELFRFGRDKGVKVRSVYGGSSIEKQIKALRSGA 120 

NKI T +M IQAL+IAPTREIAVQSQEELFRFGR+KGVKVRSVYGGSSIEKQIKAL+SGA 
Sbjct: 61 NKIRTNENIIQALVIAPTRELAVQSQEELFRFGREKGVKVRSVYGGSSIEKQIKAIiKSGA 120 

Query: 121 PI\AA7GTPGRLLDLIKRKALKIiNHIETLIIiDEADEMLNIMBFLEDIEAIISRVPETRQTLLF 180 

H+WGTPGRLLDLIKRKaii L+H+ETLILDEADEMLHMGELEDIEAIISRVP RQTLLF 
Sbjct: 121 HIVVGTPGRIiLDBIKRKALIIjDHVETLILDEaDEMIiNMGFriEDIEAIISRVPaDRQTLLF 180 

Query: 181 SaTMPDPIKRIGVKEMKDPEHVKIKaTELTirarroQYYVR\nCEmKFDTMTRIM3V^ 240 

, SAIMP PIK+IGVKBMKDPEHV+IK ELraVHVDQYYVRVKE EKFDTMTRLMDV+QPE 
Sbjct: 181 SRTMPAPIKQIGVraWKDPEHVQIKNKELTHVimiQYYVRVKEQEKFDTMTRLI^^ 240 

Query: 241 LSIVFGRTKRRVDELTRGLKLRGFRAEGIHGDLDQNKSLRVIRDFKNDHIDILVATDVAA 300 

LSXVRSRTKRRVDE+TRGLKLRGFRAEGIHGDLDQNKRIjRVIRDFKHD IDILVATDVAA 
Sbjct: 241 LSIVFGRTKRRVDEITRGLKLRGFRAEGIHGDLDQNKRLRVIRDFKNDQIDILVATDVAA 300 

Query: 301 RSLDISGVTHVYNYDIPQDPESYVHRIGRTGRAGKSGQSITFVSENEMGYLTIIENLTKK 360 

RGLDISGVTHVYNYDI QDPESYVHRIGRTGRAGKSG+SITFySPlIEMGYL++IENLTKK 
Sbjct: 301 RGLDISGVTHVYNYDITQDPESYVHRIGRTGRAGKSGESITFVSENEMGYLSMIENLTKK 360 

Query: 361 R^m3MKPATASEAFQaKKKV2iLKRIARDFEDQEI:lVSKFDKFKADAI^EIJATQYTPEEIlaLY 420 

+M ++PATA EAFQAKKKVALK+I RDF D+ + S FDKFK DA++I1A ++TPEELALY 
Sbjct: 361 QMKPLRPATAEEAFQAKKKVALKKIERDFADETIRSNPDKFKGDAVQLAAEFTPEE]^ 420 

Query: 421 VLSLTVQDPESI1PEVEITREKPLPFKPSGGGF---KGKGGRG— NGRGGDRRRNDRGDR- 474 
4-LSLTVQDP+SLPEVEl REKPLPFK GGG GKGGRG N GDRR RGDR 

Sbjct: ' 



Query: 523 RNKGD 527 
R+KG+ 

Sbjct; 535 RHKGE 539 

A related GBS gene <SEQ ID 8991> and protein <SEQ E) 8992> were also identified. Analysis of tiiis 

protein sequence reveals the following: 

RGD motif 471-473 

The protein has homology with the following sequences in the databases: 

58.9/74.7% over 494aa 

Lactobacillus reuteri 

GP I 4409804 I autoaggregation-mediating protein Insert characterized 
ORF01925(301 - 1785 of 2184) 

GP|4409804|gb|AAD20136.l| |AF091502(1 - 495 of 497) autoaggregation-mediating protein 
{Lactobacillus reuteri} 
%Match =37.3 

%ldentity =58.8 %Similarity =74.6 

Matches = 290 Mismatches = 118 Conservative Sub.s = 78 

42 72 102 132 162 192 222 252 
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IRHYITICEIPSEAAVAF*IDKL*TLLLYRmWFIAFFLFSEATNRTSNL*KRVIY*IDLILYLFTFNCVTLSRLSEKITN 



IffiS*GSFflLSFRKEKHLKrTE™iSQDILSAVEKAGFVEPSPrQH^lTIPIALEGKDVIGQAQTGTGKTAAF 

. :||:|| II |::::|: 1 Ulil- llh II II I I I I I II I I 1 II I I II I I I = t 

MKFSELGLSDSLLKAIKRSGYEEATPIQEQTIPMVLEGKDVIGQAQTGTGKTAAFGLPIIENVD 



TEDITOIQALIIAPTRELAVQSQEELFRFGRDKGVKVRSVYGGSSIEKQIKAIJJSGAHWVGTPGRLLDLIKRKALKIim 
lb ll|:||:||lllhhlll|:|:hll bb lllj: I Hlbb :::|llllll I I I Hbll 



ETLILDEaDE^^MGFLEDIEaIISRVPETRQTU:lFSATMPDPIKRIGVKFMKDPEHVKIKaTELTMW^ 
:||:||||||llllllllll|:|l h lllllllllll lllllbll III hill III lllllll - 
KTIiVLDEOTEMIOTGFLEaSIESlIKETPDDRQTIiLFSATMPPEIKRIGVQFMSDPETVRIKAKELTTDLVDQYYV^^ 



1002 1032 1062 1092 1122 1152 1182 1212 

EKFDT^OTRLMIOT)QPELSIVFGRTKERVDELTRGLKIlRGFRftEGIHGDIlDQNK^ 

nil IIIMI |:|:|lllllllllll|::|| lb I llllll bll : = = lib : I I I I I I I I I I I I I I 



1242 1272 1302 1332 1362 1392 1422 1452 

ISGVTHVYmTIPQDPESYVHRIGRTGRAGKSGQSITFVSPNEMGYLTlIElJLTKKRMTGMKPATASEAFQAKKKVAIjKR 

llllllllllll I Iblllllllllllll I bllbllll II II lb II :|| II III I :|l 

ISGVTHVYNYDIPSDPDSYVHRIGRTGRAGHHGVSLTFVTPNEMDYLHEIEKLTRVRMLPLKPPTAEEAF--KGQVA 

320 330 340 350 360 370 

1479 1503 1533 15S3 1593 1623 1653 1683 

IARDFED-QELVSK--FDKFKADALEIATQYTPEELALYVLSLTVQDPESLPEVEITREKPLPFKPSGGGFKGKGGRGNG 

I I lb:: b:: | :1 : :| :: | |:|| |:||| : | |1 

- -SAFBTOIDELIAQDSTDRYBEflAEKLLETHmTDLVAAIiLNNMTKEAASEVPTOITPERPLPRRNKiaM^ -RMNS 
390 400 410 420 430 440 450 

1707 1737 1755 1785 1815 1845 1875 1905 

RGGD— RRRMDRGDRRGNRDRDDRG SRCDPKRRDDK 

lb Ibl I : b b I II I I I 



There is also homology to SEQ ID 4454. 

SEQ ID 8992 (GBS307) was expressed in E.coli as a His-fusion product SDS-PAGE analysis of total cell 
extract is shovm in Figure 56 (lane 7; MW 62kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shovm in Figure 61 Oane 2; MW 86.7kDa). 

The GBS307-GST fusion product was purified (Figure 208, lane 9; Figure 225, lane 10-11) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 272), which confirmed that the protein 
is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2179 

A DNA sequence (GBSx2296) was identified in S.agalactiae <SEQ ID 6735> which encodes the amino 
acid sequence <SEQ ID 6736>. This protein is predicted to be outer membrane protein (yaeC). Analysis of 
tins protein sequence reveals the following: 
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Possible site: 19 

»,> May be a lipqprotein 

Final Results 

5 bacterial membrane -— CertaintyfaO . 0000 (Not Clear) < suoo 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm, — Certalntys=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB73036 GB:AL139076 putative periplaamic protein [Campylobacter 

jejuni] 

Identities = 89/237 (37%) , Positives - 132/237 (55%) , Gaps = 3/237 (1%) 

Query: 40 ITVATYSKETSTFIiDLVKDNVKEKBYTLKVVMVSDYIQANIALEaaKEHD^ 99 
15 IT+ P + L+L+KD+ K KEY LK+V SDYI N ALE KE DMSII, QH+ F+ 

Sbjct: 23 ITIGa.TPNPFGSIJuEimDDFKl!TO3YELKIVBFSDYILPJSlRAriEEKE^ 82 



Query: 100 IFimaTCHLVSITPIYHSLafiFYGQHLKNIAELKriGaKmiPSDPflOTraJA^ 159 
+N + +L++ TP+ + G Y + +KN+ LK+GA+VMP+D N +E7^ riL++ K 
20 Sbjct: 83 BYNLKKGSNLIATTFVLiaFVGVySKKIKNIjENLKEGfiRVAIPNmTNESRM.ELr.EKAK 142 



Query: 160 LITLKNTSKKTKA.IEDIITNPKKLR1EFVALLNLNQAYFEYDLVFNFPGYVTKINLVPKR 219 

LI L + KT DI NPKKL+ + L+A+D-H + LP + 

Sbjct: 143 LIELNKNTLKTPL--DINKNPKKLKFIELKAAQLPHALDDVDIAIINSNFALGAGLNPSK 200 

Query: 220 DRLLYEKKPDIRFAGALVAREDNKNSDKIKVLKEVLTSKEIRHYITKEIPSEAAVAP 276 

D+EK++ +VR + KNS+K KV+ E+L S + + I + AF 
Sbjct: 201 DTIFREDK-i: 



30 SEQ ID 6736 (GBS126) was expressed in E.coli as a His-fusion product SDS-PAGE analysis of total cell 
extract is shown in Figure 34 (lane 7; MW 32kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2180 

A DNA sequence (GBSx2297) was identified in S.agalactiae <SEQ ID 6737> which encodes the 
acid sequence <SEQ ID 6738>. This protein is predicted to be probable permease of ABC 

Analysis of this protein sequence reveals the following: 



3 N-terminal signal sequence 



Likelihood = 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



190 - 206 



187 - 21S: 



- Certainty=0. 5798 (Affirmative) < suoc; 
bacterial outside — Certainty=0. 0000 (Nbt Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database, 

>GP:AaG08889 GB:AE004963 probable permease of ABC transporter 
[Pseudoraonas aeruginosa] 
Identities = 80/206 (38%) , Positives = 127/206 (60%) , Gaps = 4/206 (1%) 

Query: 15 SFWEraiMLQLTLILCFLIAFPTGILLFSLRKSYLIKHSLAYQLLNLFLGTLRSVPFLIF 74 

+FW MLG +L+ ++ P G+LLF + + Y LL+L + LRS+PF+I 

Sbjct: 24 TFW MLGGSLLFTWLGLPLGVLLFLTGPRQMFEQKAAnfTLLSLWNILRSLPFIIL 79 
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Query: 75 IFILIPIMlLIFGTSFGTIAAILPLTI.VSVSLYARYVEQ!\IjtNlPQVVVDRALSLGANKR 134 

+ ++IPL LI GTS G AI Pti + + +AR VE AL + + +++ ++GA+ R 
Sbjct: 80 LIVMIPLTVLITGTSLGVRGAIPPLVVGATPFFARIiVETMiREVDKGIIEATCJAMGRSTO 139 

Query: 135 QIIYYFLIPSIKIDLVLSFTATAISILGYSTIMGVIGAGaiiGEYAYRFGYQBYDYPVMYl 194 

QII+ L+P + ++ + T TAI+++ Y+ + GV+GAGGIiG+ A RFGYQ + VM + 
Sbjct: 140 QIIWmLLPElUlPGIIARITVTAITLVSYTAftlAGVVGAGGLGDIiAIRFGYQRFQTDVMVV 199 

Query: 195 IWLFIIYVFILQSLGYFIANRYSRK 220 

W+ +1 V ILQ++G + +SRK 
Sbjct: 200 TWMLLILVQILQTVGDKLWHFSRK 225 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2181 

A DNA sequence (GBSx2298) was identified in S.agalactiae <SEQ ID 6739> which encodes the amino 
acid sequence <SEQ ID 6740>. This protein is predicted to be ABC transporter, ATP-binding protein 
(oppF). Analysis of this protein sequence reveals the foUowmg: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty^O. 5454 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9333> which encodes amino acid sequence <SEQ ID 9334> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22280 GB:U32744 ABC transporter, ATP-binding protein 
[EteemophiluB influenzae Rd] 
Identities = 62/174 (35%) , Positives = 104/174 (59%) , Gaps = 2/174 (1%) 

35 Query: 1 MKMINGI.IPYDKGOTYYQGKEVKSFSrmLRQ^KKDIAYIFQ^^^&LA6ESVYYHIALVY 60 

++ +N L G++ G E+ 3D +L R+ I IFQ+ NLL+ +V+ ++AL 

Sbjct: 48 IRCWLLEKPTSGSVIVDGVELTia.SDRELVIARRQI(MIFQHEISILLSSRTVFENVALPL 107 

Query: 61 KIJSIHQKVN--HDAIITOILDPLGLMDLKQVKCHSLSGGQQQKWAIAMAVLQKPKLILCDEI 118 
40 . +L + +1 +LD -^GL + + -LSGGQ+Q+VAIA A+ PK++LCDE 

Sbjct: 108 EliESESKAKIQEKITALLDLVGjSEKRDAYPSNLSGCSaKQRVAIARALASDPKVLLCDER 167 

Query: 119 SSALDTNSEKEIPNIiIiSDLREKYGISILMlAHHLSLLKQYCDRVMlLDHQTIVD 172 
+SALD + + I IiL ++ GI+IL+I H + ++KQ CD+V ++D +V+ 
45 Sbjct: 168 TSAIIJPATTQSILKLLKElNRTLGITILLITHEMEVVKaiCDQVAVIDQGRLVE 221 

There is also homology to SEQ ID 76. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 

vaccines or diagnostics. 

50 Example 2182 

A DNA sequence (GBSx2299) was identified in S.agalactiae <SEQ ID 6741> which encodes the amino 
acid sequence <SEQ ID 6742>. Analysis of this protein sequence reveals the following: 
Possible site: 21 
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»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 {Hot Clear) < suco 

bacterial outside Certaintyi=o. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 {Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2183 

A DNA sequence (GBSx2300) was identified in S.agalactiae <SEQ ID 6743> which encodes the amino 
acid sequence <SEQ ID 6744>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0904 (Affirmative) < suco 

20 ' bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — - Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9741> which encodes amino acid sequence <SEQ ID 9742> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:ftaB87515 GB:AF034138 unknown [Bacillus subtilis] 
Identities = 74/125 (59%) , Positives = 92/125 (73%) 

Query: 5 MC3IFSGIJraiASQMim)KSmiQLSDILISDEaVDLAYTIiIRDLIVFTNYRLILVDKQGTO 64 
30 MG GIi+GNAS + T V+ +L+ IL+ E+V+ A+ L+RDL1VFT+ RLILVDROG+T 

Sbjct: 1 MGFIDGLLGNRSTLSTAAVQEEIAHILIiBGEKVBlU^KLVRDLlVI'TDKRLiriVDKQGIT 60 

Query: 65 GKKVSyNSIPYASISRFTVETSGHFDLDAELKIWISSAIEPAEVLQFKNDRNIVSIQKAL 124 
GKK + SIPY SISRF+VET+G FDLD+ELKIWIS A PA QFK D +1 IQK L 
35 Sbjct: 61 GKKTEFQSIPYKSISRFSVETAGRFDtDSELKIWISGAELPAVSKQFKKDESIYDIQKVL 120 

Query: 125 ATAVL 129 

A + 
Sbjct: 121 AAVCM 125 

40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 

vaccines or diagnostics. 

Example 2184 

45 A DNA sequence (GBSx2301) was identified in S.agalactiae <SEQ ID 6745> which encodes the amino 
acid sequence <SEQ ID 6746>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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- Certainty=0 . 0000 (Not Clear) < suco 

- Certaintyi=0.0000(Not Clear) < suco 

A related GBS nucleic acid sequence. <SEQ ID 933 1> which encodes amino acid sequence <SEQ ID 9332> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CflR7473 9 GB:Y143 70 peptide chain release factor 3 
[Staphylococcua aureuc] 
Identities = 274/462 (59%), Positives = 349/462 (75%), Gaps = 9/462 (1%) 

Query: 1 MDIEKQRGISVTSSVMQFDYAGKRWIIJZrPGHEDFSEDTYRTLMaVDAAVMVVDfiAKGI 60 

M +E++RG1SVTSSVMQFDY +NII.DTPGHEDFSEDTYRTLMAVD+AVMV+D AKG+ 
Sbjct: 57 MKOTlQERGISOTSSVMQFDYDDYEINIIOTPGHEDFSEDTYRIIiMAVDSAVMVIDCftKGV IIS 

Query: 61 EAQTKKLFEVVKHRNIPVFTFI1JKLDRIX3KEPI£)LLEELEEVLGIASYPMNWPI^^ 120 

E T KLF+V K R IP+FTFINKLDR G+EP H-LL+E+EE L 1 +YPMNWPIGMG+SF 
Sbjct: 117 EPPTLKLFKVCKMRGIPIFTFIimiDRTOKEPPELLDEIEEmJIETYPMNWPIGMGQSF 176 

Query: 121 EGIiYDLHNKRLELYKGDERFASIEDG DQLFANNPFYECJVKEDIELLQEAGNDFSE 175 

G+ D +K +E ++ +E •!• D D N+ +EQ E++ L++EftG F 

Sbjct: 177 FGIlDRKSKTIEPFRIffiENILHIJroDFErSEDHaiTiroSDPEQRIEEUMLVEEaGEaFnH 236 

Query: 176 QAII^DLTPVPFGSftLTNFGVQTFrOTFIiEFAPEPHGHKTTHGOTIDPIJUaJFSGFW 235 

A+L GDLTPVFPGSAL NFGVQ FL+ +++FAP P+ +T E + P FSGF+FK 
Sbjct: 237 DftLIiSGDLTPVFFGSaiJmFGVGNPimYVDFAPMPNftRQTKENVEVSPFDDSF^^ 296 

Query: 236 IQANMDPRHRDRIAFVRIVSGEFERGMGVOTLTRTGKGAKLSNVTQFMAES-R^^ 294 

IQANMDP+HRDRIAF+R+VSG PER + + L +K S+V + + ++++ V +AVA 

Sbjct: 297 IQANmPKHRDRIAFMRWSGAFER-VMimLCNVLIKSKRSHVQRHLWQTIKKLVNHA^ 355 

Query: 295 GDIIGVyDTGTYQVtroLTVGKNKFEEEELPTFTPEIjFMKVSAKNVMKQKSFHKGIEOLV 354 

GDIIG+YDTG YQ+GDTL GK 4- F+ LP PTPE+FMKVSRKNVMKQK FHKGIEQLV 
Sbjct: 356 GDIIGLYDTGKYQIGDTLVGGKQTYSFQDIJQFTPEIFMKVSAKNVMKQKHFinra 415 

Query: 355 QEGAIQLYKNYQTGEYMI£aVGQLQFEVFKHFMEGEYHAEWOTPMGKKTWW--INSDD 412 

QEGAIQ YK T + +I1GAVGQLQFEVF+HRM+ EYN +WM P+G+K RW N D 
Sbjct: 416 QEGAIQYYKTLHTNQIILGAVGQLQFEVFEHRMKNEYlWDWMEPVGRKIARmiENEDQ 475 

Query: 413 LDERMSSSENlLAKDRFDQPVFIjFENDFALRWPADKYPDViaj 454 

+ ++M++SR+IL KDR+D VFLFEN+FA RWF +K+P++KL 
Sbjct: 476 ITDKMNTSRSILVKDRYDDLVFLPENEFATRWFEEKFPEIKL 517 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6747> which encodes the amino acid 
sequence <SEQ ID 6748>. Analysis of this protein sequence reveals the following: 

N-tezmlnal signal sequence 

Pinal Results 

bacterial cytqplasm — Certainty^O . 2070 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty-0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 447/458 (97%) , Positives = 455/458 (98%) 

Query: 1 MDlEKQRGISVTSSVMQFDYAGKRVNILDTPGHEDFSEDTYRTLMAVDAAVMVVtlSAKGI 60 

MDIEKQRGISVTSSVMQFDYAGIOlVNILiyrPGHEDFSEDTYRTimVEAAVMWDSAKGI 
Sbjct: 57 MDIEKQRGlSVTSSV^X3PDYAGKRVNIU^TPGHEDFSEDTYRTIi^avnAAVMWDEAKGI 115 

Query: 61 EAQTOKLPEVVKHRNIPVFTFINKLDRDGREPLDLLEELEEVU3IASYPMNWPIGMGKSF 120 

EAQTKKLPEVVKHRHIPVFTFIinajDRDGREPL+LLEELEEVLGIASYPMNWPIGMG++ 
Sbjct: 117 EAQTKKLFEVVKHRNIPVFTFINKLDRIXSREPLELriEELEEVrfilASYPMISIWPK^ 176 
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Query: 121 EBLYDLHNKRLELYKGDERFASIEDGDQLFANNPFyEQVKEDISLLQEAGNDFSEQAILD 180 

EGLyDLHNKRLELYKGDERFASlEDGDQLFANNPF\'EQVKEDIELLQEAGNDFSEQAILD 
Sbjcb: 177 EGLYDLHNIOJLEIiYKGDEREaSIEDGDQIiFANNPFYEQVKEDIELLQEAGNDFSEQAILD 236 

Query: 181 GDLTPVFFGSflLTWFGVQTFI^TFLEFAPEPHGHKTTEGOTIDPrAKDFSGFVFKIQANM 240 

GDLTPVFFGSaLTNFGVQTFLDTFLEFAPEPHGHKTTEGNV+DPLAKDFSGFVFKIQaNM 
Sbjct: 237 GDLTPVFFGSaLTNFGVQTFIOT:FLEFAPEPHGHKTTEGNVVDP]aKDFSGFVFKIQA^^ 296 

Query: 241 DPRHRDRIAFVRIVSGEFEROiKVNLTRTGKGaKLSNVTQFMRESRENVTim^ 300 

DP+HRDRIAFVRIVSGEFERGMCSVHLTRTGKGAKLSHVTQFMAESREN^ 
Sbjct: 297 DPKHRDRIftFTOIVSGEFERGMGVMjTRTGKGRKLSim'QFI^AaSEJEIWTNAVAGDIIGV 356 

Query: 301 yDTGTYQVBDTLTVGKNKFBPEPLPTFTPELEMKVSAKNVMKQKSFHKGIEQLVQEGAIQ 360 

YDTGTYQVGDTLTVGKNKFEFEPLPTFTPE+FMKVS KNVMKQKSFHKGIEQLVQEGAIQ 
Sbjct: 357 YDTGTYQVGDTLTVGKNKFEFEPLPTFTPEIEMKVSPKNVMKQKSFHKGIEQLVQKQRIQ 416 

Query: 361 LYKNYQTGEYMLGAVGQLQFEVFKHRMBGEYNAEVVMTPMGKKTVRWINSDDLDERMSSS 420 

LYKlIYQTGEYMLGAVGQLQFEWKHRMEGEYNAEVVMTPMGKiaWWI+ DDLD+RMSSS 
Sbjct: 417 LYKNYQTOEYMLGAVGQLQFEVFKHRJffiGEYNaEVVMTPMGKKOTRWISEDDLDQHMSSS 476 

Query: 421 RNIIAKDRFDQPVFLFENDEALRWFADKYPDVKLEEKM 458 

RNILRKDRFDQPVFLFENDFALRWFADKYPDV LEEKM 
Sbjct: 477 RNIIiAKDREDQPVFLFEiroEaLRWFADiOTDVTIiBBKM 514 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2185 

A DNA sequence (GBSx2302) was identified in S.agalactiae <SEQ ID 6749> which encodes the amino 
acid sequence <SEQ ID 6750>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0. 3061 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 

+IiEFARQK3GKE+KEMavrFVTNERSHEIjNL+'XRDT+RPTDVISi:iEYKPE +SFDEEDL 
Sbjct: 23 ILEFAAQKTCKEDKEMAVTFraffiRSHEIMiKraDTNRPTDVISI^ 82 

Query: 61 AENPBLaEMLEDFDSYiaELFISIDK2\KEQftEEYGHSYEREMGFriaVHaFIiHIWGYnHXT 120 

A++P+LAE+L +FD+YIGELF1S+DKR+EQA+EYGHS+EREMGFLKVHGFLHINGYDHYT 
Sbjct: 83 ADDPDLAEVLTEPDAYIGELFISVDKAREQftQEYGHSPEREMGPIAVHGFLHINGYDHYT 142 

Query: 121 PEEEKEMFSLQEEILTAYGLKR 142 

P+EEKEMFSLQEEIL AYGLKR 
Sbjct: 143 PQEEKEMFSLQEEILDAYGLKR 164 

There is also homology to SEQ ID 120. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 
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Example 2186 

A DNA sequence (GBSx2303) was identified in S.agalactiae <SEQ ID 6751> which encodes the amino 
acid sequence <SEQ ID 6752>. Analysis of this protein sequence reveals the foUowhig: 

Possible site: 59 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-15.39 Transmembrane 108 - 124 ( 100 - 131) 
INTEGRAL Likelihood = -8.92 Transmembrane 61 - 77 ( 52 - 82) 
INTEGRAL Likelihood = -5.36 TransraenODrane 41 - 57 { 40 - 60) 

10 Final Results 

bacterial membrane Certainty^O. 7156 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial c^'toplasm Certainty=G. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC38047 GB:AF000S54 diacyglycerol kinase [Streptococcus nnitans] 
Identities = 107/133 (B0%) , Positives = 121/133 (90%) , Gaps = 2/133 (1%) 

Query: 1 iymiOTM--IffiKKWKlSIRTLTSSNIEFAVTGIFTAFkEEiaSI»^^ 58 
20 MDL DN + KKWKNRTLTSS+EFA+TGIPTAFKEEKMM+KH VSA+L ++AGL F+VS+ 

Sbjct: 3 MDLRimQSQKKWKNRTLTSSLEFALTGIFTAPKEEramKKHAVSAIjLAVIAGLVFiWBV 62 

Query: 59 VEWLPLLLSIPLVITPEIINSAIENVVDLASNYHFSMLAKNAKDMAAGAVLWSLF^^ 118 
+EWLPLLLSIFLVITFEI+NSAIENVVDLAS+yHFSMrjiJKNAKDMaAGAVIiV+S FA L 
25 Sbjct: 63 lEWLFLLLSIFLVITFEIVNSAIENVVDLaSDYHFSMLRKNAKDMaAGAT^^ 122 

Query: 119 GLIIFIPKILALL 131 

GLIIF+PKI LL 
Sbjct: 123 GLII5VPKIWFLL 135 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6753> which encodes the amino acid 
sequence <SEQ ID 6754>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
>>> Seems to have no N- terminal signal sequence 
35 INTEGRAL Likelihood =-10.67 Transmembrane 63 - 79 ( 41 - 84) 

INTEGRAL Likelihood = -7.32 Transmembrane 110 - 126 ( 105 - 129) 
INTEGRAL Likelihood = -5.41 Transmembrane 43 - 59 ( 41 - 62) 

Final Results 

40 bacterial membrane --- Certainty=0 . 5267 (Affirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 >GP:AAC38047 GB:AF000954 diacyglycerol kinase [Streptococcus mutans] 

Identities = 104/135 (77%) , Positives = 119/135 (88%) 

Query: 1 MALHIOTrTKRKIJKNRTITSSLBFALTGVFTAFKEERNIiRSHLLSACLACVAGLFFSISA 60 
M L ON +++KWKNRT+TSSLEFALTG+FTAFKEERN++ H +SA LA +AGL F +S 
50 Sbjct: 3 MDLRDNKQSQKKWKNRTLTSSLEFALTQIFTAFKEEENMKKHAVSALLAVIASLVFKVSV 62 

Query: 61 IEraiFLLLAIFLVITLEIWSAIEN\nroLASDYHFSMIiAKNAKDMAAGAVIiMISG2AVLT 120 
lEWLFLLL+IFLVIT EIVNSAIEHWDLASDYHFSMLAKNAKDMAAGAVL-HSG+A LT 

Sbjct: 63 lEWLFLLLSIFLVITFEIWSAIENVVDLASDYHPSMLAKNAKDMAAGAVLVISGFAALT 122 
55 ' 

Query: 121 GLIIFIPKIVmiFVH 135 
GLIIF+PKIW + H 

Sbjct: 123 GLIIFVPKIWFLLFH 137 



60 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 98/129 (75%), Positives s= 115/129 (88%), Gaps = 2/129 (1%) 

Query: 1 MDLNDlm--HKKWK^ffiTLTSSMEFAVTGIFTAFKEERN^mHLVSAILVILaGLTFQVSM 58 

M L+DNN +KWKNRT+TSS+EFA+TG+FTAFKEERM+R HL+SA L +AGL F +S 
Sbjct: 1 ^mLHD]™TTialKWKNRTITSSLSFALTGVFTAFKEERNLRSHLLSACIlACVAGLFFSISA SO 

Query: 59 VEWLFLLLSIPLVlTFEIINSAIENVVDIiASiraiFSMLAKIS^^ 118 

+EWI.FI1LI1+IPLVIT EI+WSAIENVVDIAS+WSMDftKNftKDMMGaVL++S +AVL 
Sbjct: 61 IEWLFLLLAIPIlVlTLEIVHSAIENVVDIJ^DTHFSMIAK^IaKDMAAGAVLM 120 

Query: 119 GLIIFIPKI 127 

6LIIFIPKI 
Sbjct: 121 GLIIFIPKI 129 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 2187 

A DNA sequence (GBSx2304) was identified in S.agalactiae <SEQ ID 6755> which encodes the amino 
acid sequence <SBQ ID 6756>. This protein is predicted to be GTPase Era (era). Analysis of this protein 
20 sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0 . 1871 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10017> which encodes amino acid sequence <SEQ ID 
30 1 00 1 8> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41632 GB:AF072B11 GTPase Era [Streptococcus pneumoniae] 
Identities = 273/299 (91%) , Positives = 290/299 (96%) 

Query: 16 MTFKSGFVAILGRPJWGKSTFrjraVMGQKIAIMSDKAQTTRNKIMGIYTTETEQIVFIDT 75 

MTFKSGFVAILGRENVGKSTFIMWMGQKIAIMSDKAQTTRNKIMGIYTT+ EQIVFIDT 
Sbjct: 1 MTFKSGFVAILGRENVGKSTFLHHVMGQKIAIMSDKAQTTRNKIMGIYTTDKEQIVFIDT 60 

Query: 76 PGIHKPKTALGDPMVESAYSTLREVETVLFMVPADEKRGKGDDMIIERLKAAKIPVILVI 135 

PGIHKPKTALGDPMVESAYSTLREV+TVLFMVPADE RGKGDDMI 1ERLICAAK+PVILV+ 
Sbjct: 61 PGIHKPKTALGDFMVESAYSTLREVDTVLFWPADEARGKGDDMIIERLKAAKA'PVILW 120 



Query: 136 NKIDKVHPDQLI.EQIDDFRSQMDFKEVVPISAI^JGNNVFrLIKLLTDNLEEGFQYFPEDQ 195 

NKIDKVHPDQLL QIDDFR+QMDFKE+VPISALQGNNV L+ +L++NI,+EGFQYFP DQ 
Sbjct: 121 NKIDKVHPDQririSQIDDFRWQMDFKBIVPISALQGNNVSRLVDILSENLDEGPQYFPSDQ 180 

Query: 196 ITDHPERPLVSE^^VKEKVLHLTQQEVPHSVRVVVESMKRDEETDKVHIRATIMVERDSQK 255 

I™PERFLVSEMVREKVI^HLT^-+E+PHSVaVW■^SMKro^^ 
Sbjct: 181 ITDHPERFLVSEimEKVLHLTREEIPHSVAVVVDSMKRDEETDKWHIRATIMVERDSQK 240 

Query: 256 GIlIGKQG2mKKIGK^IaRRDIELMIfiDKVYLETWVKVKKNWRDKKM 314 

GIIIGK GAMLKKIG MAERDIEM^DKV+LETWVKVKKNVffiDKKIIJIM 
Sbjct: 241 GIIIGKGG!mKKIGSMftRRDIEIJttGDKWLETWVKVKKNm)K^ 299 

A related DNA sequence was identified m S.pyogenes <SEQ ID 6757> which encodes the amino acid 
sequence <SEQ ID 6758>. Analysis of this protein sequence reveals the following: 

J-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 1088 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certaiiity=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 295/297 (99%) , Positives = 296/297 (99%) 

Query: 18 FKSGFVAIIXSRtWGKSTFIJSimiGQKIAIMSDKaQTTRN 77 

PKSGPVRlLGRENTOKSTFIiNHVMGQKIAIMSDKAQTTRNKIMGIYTTETEQIW 
Sbjct: 2 FKSGFVaiLGRPNVGKSTFUmMGQKiaiMSDKAQTTKSIKIMGIOT 61 

Query: 78 IHKPKTAIK3DF^IVESAySTLREVETVLFMVPADBKEGKGDDMIIEH.KaaKIPV^ 137 

IHKPKiaLGDFMVESAYSTLKEVETVLFMVPMEKRGKGDDMIIEIUiKZiAKIPVIL^^ 
Sbjct: 62 IHKPKTMCDFMVESAYSTLREVETVLFMVPMEKRGKGDDMIIERLKAAKIPVILVIM 121 

Query: 138 IDKVHPDQ]:iLEQIDDFRSQ^mFKEVVPISaI<JGlnOTTLIKI:JLTImlEH^ 197 

IDKVHPDQLLEQIDDK SQMDFKEVVPISAL+GimVPTLIKLLTDNLEEGFQyFEEDQIT 
Sbjct: 122 IDKVHPDQLLEQIDDPHSQM)FKEVVPISALE6NimTIiIK]:iTDlII.EE6FQYFPEDQIT 181 

Query: 198 DHPEKFLVSEMVREKAnLHLTQOEVPHSVAVVVESMKRDEETDKVHIRATIMVERDSQTO 257 

DHPERFLVSE^ro^EKA^JHLTQQEVPHSVAVVVESMKRDEETDKUHIRATIMVERDSQRGI 
Sbjct: 182 DHPERFLVSEIWREKVLHLTQQEVPHSVAVVVESMKRDBETDKVHIRATIMVERDSQKGI 241 

Query: 258 IIC3KQGAMLKKIGKMARRDIELMLGDKVYLETWVKVKiaSIWFmKKLDIMFGYNE^ 314 

I IGKQGflMLKKIGKMARRDIELMI^DKVYLETWVIOTCKlJWRDKKLDIM 
Sbjct: 242 IIaKQGftMLKKIGKMARRDIE]:MLGDKVYLETWVKVKK^Ma^KK^ 298 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2188 

A DNA sequence (GBSx2305) was identified in S.agalactiae <SEQ ID 6759> which encodes the amino 
acid sequence <SEQ ID 6760>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 2679 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 2189 

A DNA sequence (GBSx2306) was identified in S.agalactiae <SEQ ID 6761> which encodes tiie ammo 
acid sequence <SEQ ID 6762>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 21 

>» Seems to have a cleavable N-term signal seq. 

- Certainty=o.3OC0 (Affirmative) < suco 
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bacterial menibrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAai6793 GB:D90900 hypothetical protein [Synechocystis sp.] 
Identities = 36/119 (30%) , Positives = 57/119 (47%) , Gaps = 15/119 (12%) 

Query: 390 TSDYEKZUCVIHDHLVimTYATEEIATTKETASGISIHMEALYKDiaiGVCC^ 449 

++D+E+ft++ + + N y +A TR I PE + +C ++ +++ 

Sbjct: 153 SKDWEEffiRLAYSWITQNIRYDVP-MAETRN IDDLRPETVLARGETICSGYSNLYQA 207 

Query: 450 MAATAGLSVWYVTGQAGGG NHAWNI\T:NGVKYYVDTTWDW1QIKSNKYF 498 

+A GL V + G A GG NHAWK V I+G Y +DTTW I S+ P 

Sbjct: 208 IiAKELGIjDWIIEGFAIGSGDVIVGDDPDVmzVWNQVICIDGQWYLLDTTWGAGIVSDGKF 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6763> which encodes the amino acid 
sequence <SEQ ID 6764>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» May be a lipoprotein 

Pinal Results 

bacterial menibrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=o . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certalntyi=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 41/181 (22%) , Positives = 79/181 (42%) , Gaps = 17/181 (9%) 

Query: 355 1TITYTLKGDMVGLHKEYKQFVDSFVKENITNKNITSDYEKAKVIHDHLVNNYTYATE-- 412 

+ +T+ + D ++++ 0 + + ^ N +K+ YE+ K ++ ++ + Y + 
Sbjct: 124 VFVTFPIPEDRJCNIYQDIi-QAJGNDXVANTPSKD RYEQVKYFYEVIIRDTDYNKKAF 179 

Query: 413 ELATTRETASGISIHAPEaLYKDKRGVCQAFAVMFKDMaATAGLSVWYVTGQAGGGN--- 469 

E + A S ++++ D VC +A F+ + AG+ V Y+ G 
Sbjct: 180 EAYQSGSQAQWiSNQDIKSOTIDHLSVCMGYAQRFQFLCQKAGIFWAYIRSTGTSQQPQQ 239 

Query: 470 

Sbjct: 240 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2190 

A DNA sequence (GBSx2307) was identified in S.agalactiae <SEQ ID 6765> which encodes the amino 
acid sequence <SEQ ID 6166>. This protein is predicted to be rgg protein. Analysis of this protein sequence 
reveals the following: 

possible site:- 29 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 187 - 203 ( 187 - 203) 

Pinal Results 

bacterial membrane — Certalnty=0 . 1065 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10015> which encodes amino acid sequence <SEQ ID 

10016> was also identified/ 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 KELGKTLRRLRKGKKVSISSLflDEHLSKSQISRFERGESEITCSRLmiLDKLNITIDEF 67 

K GK L+ +R+ K +S+ +A +S +Q+SR+ERG S +T + L +++++ EF 

Sbjct: 5 KSSGKIIiKIIRESKiraSrjKBVRaGDISVAQIjSRYERGISSLTVDSFYSai^ 64 

Query: 68 VSI-HSKRHTHFFILUSJRVRKYCaEKKVTKLVALL EDHNHKDYEKIMIK 115 

+ H+ +L ++ + E N+ KL ++L E N+K 

Sbjct: 65 QYWHNYREADDVVLSQKLSEAQREmiVKLESILAGSEAMAQEFPEKKireK-IiNTIVIR 123 

Query: 116 ALIFSIDQSIEPNQEEI:J«lLTDYLPTVEQWGYYEIILLGNCSRLINYHTLFLLTKE^WNS 175 

A + S + + ++ ++ LTDYLF+VE+WG YE+ L N L+ TL EM+N 
Sbjct: 124 ATLTSCNPDYQVSKGDIEPLTDYLPSVEEWGRYELWLFTNSVNLLTLKTLETFASEMINR 183 

Query: 176 FAYSEQNKIMKILVTQLMNCLIISIDHSYFEHSHYLIDKTOSLtODEWFYEKTVFLYV 235 

•I- N+ + ++ +N + I++++ + + ++ + + E + Y++ + Y 

Sbjct: 184 TQFYHNLPEOTlRRIiroaUJWSACIENOTILQVai^ 243 

Query: 236 TGYYHLKLGDTSSGKEDMRKaLQIFKYLGEDSF 268 

Y K+G+ •+ + D+ + L F+YL DSF 
Sbjct: 244 KaLYSYKVGNPHA-RHDlEQCLSTPEYL--DSF 273 

There is also homology to SEQ ID 628. 

Based on this analysis, it was predicted that this protein and its epitopes, could be viseful antigens for 
vaccines or diagnostics. 

Example 2191 

A DNA sequence (GBSx2308) was identified in S.agalactiae <SEQ ID 6767> which encodes the amino 
acid sequence <SEQ ID 6768>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty^O . 3234 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiiity=0 . 0000 (Not Clear) < suco 



The protem has homology with the following sequences in the GENPEPT database. 

>GP:BAA05066 GB:D26071 f ormamidqpyrimidine-DNA glycosylaae 
[Streptococcus nuitans] 
Identities = 182/271 (67%) , Positives = 217/271 (79%) 

Query: 1 MPELPEVETVRKGLERLVVlJQElASITIKVPKMVKTDIJSlDFMISLPGKriQQVLRRGKYL 60 

MPELPEVETVR+GLE L+V ++I S+ ++VPKMVKT + DF + + G+T + + RRGKYIi 
Sbjct: 1 MPELPEVETVRRGLEHLIVGKKIVSVEVRVPKIWErGVEDFQLDILGQTPESIGRRGKYL 60 

Query: 61 LFDFGEMOT1VSHLRMEGKYLLFENKVPDNKHFHr.YFKLTKGSTLVYQDVR^ 120 

L + t+SHLRMEGKYLLF ++VPni!IKHFHL+P L GSTLVYQDVRKFGTEEL+ K 

Sbjct: 61 IiIJIIJ«lQTIISHLRlffiGKYLIiFEDEVPimHFHLFPGLI)GGSTLVY^ 120 

Query: 121 SSLKDYFTQKKLGPEPTJVOTFQFEPPSKGUUJSKKPIKPIJ^QRLVaGI^IY^ 180 

S ++ YF QKK+GPEP A F+ +PP tGLA S K IK LLLDQ LVAGLGNIYVDEVLW 
Sbjct: 121 SQVBAYFVQKKIGPEPNRKDFKLKPFEEGLRKSHKVIKTLLnDQHLVRGLGNIYVDEVLW 180 

Query: 181 AAKIHPQRIOTQLTESETSLLHKEIIRILTIiGIEKGGSTIRTYKNaLGEDGTMQKYLQVY 240 
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AAK+ P+RLR+QL SE +H E IRIL L lEKGGSTIR+YKN+LGEDG+MQ LQVY 
Sbjct: 181 AAKVpPERIASQLKTSEIKRIHDKTIRILQLAIEKCSGSTIRSYKNSLGEDGSMQDCLaVY 240 

Query: 241 GKTGQPCPRCGCIiIKKIKTOGRGTHYCPRCQ 271 
5 GKT QPC RC I+KIKVGGRGTH+CP CQ 

Sbjct: 241 GKTDQPCaRCATPIEKIKVGGRGTHFCPSCQ 271 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6769> which encodes the amino acid 

sequence <SEQ ID 6770>. Analysis of this protein sequence reveals the following: 

10 Possible site: 54 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2068 (Affirmative) < suco 

15 bacterial membrane Ce3:tainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 190/271 (70%) , Positives = 229/271 (84%) 

20 



Query: 


1 


MPEtPEVETVRKGLERLVVlCEIASITIKVPKMVKTDLNDFMISLPGKTIQQVLRRG 6 0 






MPEIiPEVKTVR+GLE LV+ QEI ++T+KVPKMVKTDL P ++LPG+ IQ V RRGKYI. 


Sbjct: 


1 


MPELPEVETWRGLETLVLaQEl\ffiVTLKVPKMVICrDLETFALTLPGQI IQSVGRRSKYL 6 0 




61 


LFDFGEMVMVSHLRMEGKYLLFPNKVPDNKHFH]:.YFKLTNGSTLVYQDVRKFGTPELVRK 120 






L D G++V+VSHLRME6iQrLLFP++VPDNKHFH++F+L NGSTLVYQDVEKFGTP+L+ K 


Sbjct: 


61 


LIDLGQLVLVSHLRMEGKYLLFPDEVPDNKHFHVFFELKNGSTLVYQDVRKFGTPDDIAK 12 0 




121 


SSLKDyFTQKKLGPEPTADTFQFEPPSKGLftNSKKPIKPLLLDQRLVAGLGNIYVDEVLW 180 






S r. +F ++KLGPEP +TF+ + P L +S+KP1KP IiLDQ LVAGLGNIYVDEVLW 


Sbjct: 


121 


SQLSRFFAKRm3PEPKKETFKIiKTFE2iALLSSQKPlKPHriI^CrrLTOGI^ 180 


Query: 


181 


AAKIHPQRIANQLTESETSmHKEIIRILTLGIEKGGSTIRTmKLGEDGTMQKyLQ^ 240 






AAK+HP+ ++E IiH E IRIXi LGlEKGGST+RTY+NaiiG DGTMQ YLQVY 


Sbjct: 




AAKVHPKTASSRIiNKaEIKRLHDETIRIMLGIEKGGSTVRTYRNALGftrK™ 240 




241 


GKTGQPCPRCGCLIKKIKVGQRGTHYCPRCQ 271 






G+TG+PCPRCG I K+KVGGRGTH CP+CQ 


Sbjot: 


241 


GQTGKPCPRCGQAIVKLKVGGRGTHICPKCQ 271 



40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2192 

A DNA sequence (GBSx2309) was identified in S.agalactiae <SEQ ID 6771> which encodes the amino 
45 acid sequence <SEQ ID 6772>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no W-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 0797 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10013> which encodes amino acid sequence <SEQ ID 
55 1 0014> was also identified. 

The protein has homology witti the following sequences in the GENPEPT database. 

>GP:AAC00353 GB:AP008220 YtaG [Bacillus subtilis] 
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Identities = 80/189 (42%) , Positives = 113/189 (59%) , Gaps = 1/189 (0%) 

Query: 8 MTKIIGLTCMIMGKSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLCSPEILD 67 
MT +IGLTGGIASGKSTV ++ E G VIDAD + + KG Y+ +++ G +IL 
5 Sbjct: 1 MTLVIGLTGGIASGKSTVMOT^IEKGITVinaDlIAKQAVE 60 



Query: 68 WSGELDRPKLSQMIFMJPDimKTSARLQISrSIIRQELACQRDQLKQTEEIF-EMDIPLLIE 126 

++G++DR KL ++F N + + +RQE+ +RD+ E F +DIPLL E 

Sbjct: 61 SNGDIDRKKLGaLVFTIiEQKRIALNAIVHPATOQEMiajRRDEAVfl^^ 120 



Query: 127 E 

K D+I +V V KE QL+RLM RN + EEA R+ QMPL +K + A +inN+G 

Sbjct: 121 SiaJESLVDKIIWSVTI^LQLERLMKRNQLTEEEAVSRIRSQMPLEEKTARftDQVIDNSG 180 

Query: 187 DLITLKEQI 195 

L K Q+ 
Sbjct: 181 TLEETKRQL 189 

A related sequence was also identified in GAS <SEQ ID 9111> which enccxies the amino acid sequence 
<SEQ ID 91 12>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 59 
>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certaintyi= 0. 101 (Affirmative) < sue 

bacterial membrane — Certainty= 0.000 (Not Clear) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 118/191 (61%) , Positives = 153/191 (79%) 

Query: 9 TKIIGLTGGIASGKSTVTKIIRESGPKVIDADQVVHKLQAKGGKLYQaLLEWLGPEIim 6 
T IIG+TGGIASGKSTV K+IR++G++VinADQVVH LQ KGG+LY+AL E G +IL A 



Query: 69 DGELDRPKLSQMIPANPnNMKTSaRLQNSIIRQELACQRDQLKQTEElFFMDIPLLIEEK 128 

DGEIiDR KLS+M+F+NPENM TS+ +aN II++ELA +RD L Q++ IFFMDIPHiL+E 
Sbjct: 69 DGELDRTKMEMLFSNPnNmTSSAIQMQIIKEELAAKl^DHrAQSQAIFFMDIPLUffiLG 128 

Query: 129 YIKWFDEIWLVFVDKEKQLQRLMARHNYSREEftELRLSHQMPLTDKKSFASLIIDNNGDL 188 

Y WFD IWLV+VD + QLQRLMARN + +A R++ Q+P+ +KK +ASL+IDN-K3D+ 
Sbjct: 129 YQDWFDAIW.VYVnAQaX3LQRIJlM!NRIJDKGKARQRIASQLPIEEKKPY^ 188 

Query: 189 ITLKEQILDAL 199 

L +Q+ AL 
Sbjct: 189 AALIKQVQSAL 199 



A related GBS gene <SEQ ID 8993> and protein <SEQ ID 8994> were also identified. Analysis of this 
protein sequence reveals a signal peptide at residues 1-16. 

50 The protein has homology with the following sequences in the databases: 

42.2/60.6% over 189aa 

OMNI|NT01BS3382| Insert characterized 

55 ORF02237(319 - 885 of 1206) 

OM(Il|NT01BS3382(3 - 192 of 200) () 
%Match =17.0 

%Identity =42.1 %Similarity =60:5 

Matches = 80 Mismatches = 74 Conservative Svib.s = 35 

60 

78 108 138 168 198 228 258 288 

KN3PTAFG*SIDRI*NKLITQGNYSHFNFRHRKRra,iro*NI*ECSTOGRYDAKVFTGLW*NmTVSKV^ 
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RDALIiPSVSMLMTKI IGLTGGIASGKSTVTKI IREBGFKVIDiUDQVVHKLQaKCk3KlYQAL]:jEWI.GPEILnaDGELDRPK 
1=1 :|||lllllllllll I I IJIII : : II 1= =1 =11 ::|==|| I 

VDLLTLVIGLTGGIASCSKSTVANMLIEKGITVIDflDI lAKQAVEKGMPAYRQIIDEFGEDILLSNGDIDRKK 



LSQMIFflNPDNMKTS^ffiLQNSIIRQELACQRDQLKQTEEIFF-miPLLIEEKyiKWPDEIWLVFVDKEKQIX2RLMAElJN 
I ==l I = : =111= =11= I 1 =11111 1 1 1=1 =1 1 II 11=111 II 

LGALVFTNEQKRLMiNAIVHPAWQEMrilffiRDEA.VAmEAFWLDIPLLFESKLESLVDKIIWSVTKELQLERLMK^ 
90 100 110 120 130 140 150 

795 825 855 885 915 945 975 1005 

YSREEAELRLSHQMPLTDKKSFASLIID!»IGDLITLKEQlUDALQRL*ira*MDHVFIHPLSLLH*F*KrCD*TTO 

: III 1= Mil =1 = I =111=1 1 I 1= = = 
LTEEEAVSRIRSQMPLEKKTARADQVIDNSGTLEETKRQLDEIMNSWA 
170 180 190 200 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6773> which encodes amino acid sequence 
<SEQ ID 6774>. An alignment of the GAS and GBS sequences follows: 



Query: 25 WKVIRKAGYQVIDaDQWHDLQEKGGRLYEALREAFGNQILKaDGELDRTKLSEMLFSN 84 

V K+IR++G++VIDaDQWH LQ KGG+LY+AL E G 4IL ADGELDR KLS+M+F+N 
Sbjct: 20 VTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMIFAN 79 

Query: 85 PDNMATSSAIQNQIIKEELAAKRDHLAQSQAIFFMDIPLLMELGYQDWFDAIWLVYVDAQ 144 

PDNM TS+ +QN II++ELA +RD L Q++ IPFMDIPLL+E Y WFD IWLV+VD + 
Sbjct: 80 PDNMKTSaRLQNSIIRQEIACQRDQLKQTEEIFFMDIPLLIEEKYIKMPDEIWLVFVDKE 139 

Query: 145 T(a^QRLMaRNRI£lKGKflRQRIASQLPIEEKKPYASLVIDNSGDIAALIKaVQSAL 199 

QLQRLMARN + +A R++ Q+P+ +KK +ASI,tinN+GD+ L +Q+ AL 
Sbjct: 140 RQMRIJlARNNySREEAELRLSHQMPLTDKKSPASLlintmGDLITLKEQILnaij 194 ' 

SEQ ID 8994 (GBS245) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 61 (lane 6; MW 23.7kDa). It was also ejqpressed in E.coli as a GST-fusion 
product, and purified GBS245-GST is shown in Figure 211, lane 6. 

The purified GST fusion product was used to immunise mice ands the resulting antiserum was used for 
FACS (Figure 278). This confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and tiieir epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2193 

A DNA sequence (GBSx2310) was identified in S.agalactiae <SEQ ID 6775> which encodes the amino 
acid sequence <SEQ ID 611(>>. Analysis of this protem sequence reveals the following: 

J-tertnlnal signal sequence 



• Final Results 

bacterial cytoplasm Certainty=0. 4073 (Affirmative) 

bacterial membrane Certainty=0. 0000 (Not Clear) < 

bacterial outside Certainty=0 .0000 (Not Clear) < 



55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA30330 GB:AP000005 2S3aa long hypothetical ATP-binding 
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tremsport protein [Pyrococcus horikoshii] 
Identities = 78/240 (32%) , Positives = 130/240 (53%) , Gaps = 13/240 (5%) 



Query: 


3 


LVIRDIRKRPQETEVLRGASYRFYSGKITGVIXSRNGaiGKTTLFNIIiTODIJ^ 


62 






+++ ++RK+F EVL+6 ++ G+I G+LG NG+GK+T IL G + G + + 




DC 




IIVSainjRKKFGSKEVLKBINFTVMXSElYGLLGPNGSGKSTTMRI 


61 


Query: 


63 


-KImEYPLTDKDl-GIVYSE^^fLEEFLTGYEFVKPY^mLH--PSDDL-MTIDDYLDF^1E 


117 






D P+ K+I GV LELTEFF + PDIi + +D 




Sbjct: 


62 


GVDVSRDPMKVKEIVGYVPETPALYESLTPAEFFSFIGGVRRIPQDIIEERVKRLVKiPG 


121 




118 


1GQTERHRIIKC4YSDC*1KSKLSI1,ICLMISKPKVIIJ:jDEPLTAVDWSSIA1KRLLLELSE 


177 






IG+ +++I S G K K+SLI ++ P+V++LDE + +D S+ + LL E E 




Sbjct: 


122 


IGK-YMNQLIGTLSFGrKQKISLISALLHDPQVLILDEAMNGLDPKSaRIFRBLLPEFKE 


180 


Query: 


178 


D-HIIILSTHIMALAEDLCDIVAVLDKGKL- - -QTLDIDR---KHEQFEERLLQVLKGDE 


230 






+ 1+ STHI+ALAE +CD + ++ +G+- T+D R + E+ E+ L++ + E 




Sb j Ct : 


181 


EGKSIVFSTHILftLAEVMCDRIGIIYEGRIVAEGTIDELREIAREEKLEDIFLKLTQAKE 


240 



20 There is also homology to SEQ ID 2876. 

• Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2194 

A DNA sequence (GBSx2311) was identified in S.agalactiae <SEQ ID 6777> which encodes the amino 
25 acid sequence <SEQ ID 677 8>. Analysis of this protein sequence reveals the following: 

Possible site; 14 

>» Seems to have no N-terrainal signal sequence 

Final Results 

30' bacterial cytoplasm CertaintYi=0. 6138 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2195 

A DNA sequence (GBSx2312) was identified in S.agalactiae <SEQ ID 6779> which encodes the amino 
40 acid sequence <SEQ ID 6780>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 20 



45 





have no N- terminal signal sequence 










INTEGRAL 


Likelihood =-15.34 


Transmembrane 


526 


542 


511 


546) 


INTEGRAL 


Likelihood = -9.61 


Transmembrane 


340 


356 


335 


359) 


INTEGRAL 


Likelihood = -8.17 


Transmembrane 


455 


471 


451 


476) 


INTEGRAL 


Likelihood = -8.01 


Transmembrane 


97 




95 


121 


INTEGRAL 


Likelihood = -3.01 


Transmembrane 


216 


232 


207 


236 


INTEGRAL 


Likelihood = -3.40 


Transmembrane 


50 


- 66 


45 


67 


INTEGRAL 


Likelihood = -1.33 


Transmembrane 


178 


194 


178 


194 



50 

Final Results 

bacterial membrane Certainty=0. 7135 (Affirmative) < succ; 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10011> which encodes amino acid sequence <SEQ ID 
10012> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 376. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted ' that this protein and i 
vaccines or diagnostics. 



1 epitopes, could be useful antigens for 



Example 2196 

A DNA sequence (GBSx2314) was identified in S.agalactiae <SEQ ID 6781> which encodes the amino 
acid sequence <SEQ ID 6782>. Analysis of this protein sequence reveals the following: 



INTBGRAL 



Transmembrane 



Possible site: 32 

»> Seems to have no N-terminal signal sequence 

IHIBGRaL Likelihood = - 

INTEGRftL Likelihood = - 

Likelihood = - 

Likelihood = - 

Likelihood = - 

Likelihood = - 

Likelihood = - 



140 - 156 ( 134 - 160) 

255 - 271 { 253 - 274) 

345 - 361 ( 343 - 363) 

184 - 200 ( 183 - 202) 

66 - 82 ( 55 - 83) 

221 - 237 ( 221 - 239) 

121 - 137 ( 121 - 137) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4270 (affirmative) < succ: 

- Certaihty=0. 0000 (Not Clear) < suco 
■ Certaintys=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9401> which encodes amino acid sequence <SEQ ID 9402> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA07482 GB:AJ007367 multi-drug resistance efflux pump 

[Streptococcus pneumoniae] 
Identities = 213/372 (57%), Positives = 295/372 (79%) 

MPFMVLYVEQLGAPSNJWEWYAGLSVSLS.zUjSSALVAPLWGRLADKYGRKPMMVRAGLMM 60 
+PFM ++VE LG S +V +YAGL++S+SA+S+AL +P+WG LADKYGRKPMM+RftGL M 
VPFMPIFVErafiVGSQQyaPYMLAISVSAISaALFSPIMGIIADKYGRKPMMIRAGLaM 87 

TFTMGGLJ^'IHSVTGLLILRIIMGIFAGYVHJSTALIASQaPQEESGiMiGTLAra^^ 120 
T TMGGLAF+ ++ L+ LR+UJG+PAG+VPN+TALIASQ P+E+SG ALGTL+TGV G 



MLIGPLIXMLIJffiWTOIREOTLLVGTILLISTUyiTIFMVKEDFKPlSNEETMPTTEOTKS 180 

L GP +GG +AE FGIR VFLLVG+ L ++ ++TI +KEDF+P++ E+ +PT E+F S 
TLTGPFIGGFIAELFGIRTVFLLVGSFLFLAAILTICFIKEDFQPVAKEKAIPTKELFTS 207 



+L+ LF+TS +IQ SAQSI PIL LY+R LGQTEJIL+FVSGLIVS MGPSS++S+ 



+G++GD++GNHRLL++A YS ++y+LC+ A + LQLG+ REL+G GTGAL+P +N++ 







Sbjct: 


28 




61 


Sbjct: 


88 




121 


Sbjct: 


148 




181 


Sbjct: 


208 




241 


Sbjot: 


268 




301 



L+K+ P+ G+SR+F++NG+F LG V+6P GSAV+ Gh- VF+ TS V 
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Sbjct: 328 LSECMTPKAGISRVFAFNQVFFYLGGVVGEmGSAVAGQFGYHAVFYATSLCVAFSCLFNL 387 

Query: 361 INFRKVIRVKEl 372 
I FR ++VKEI 
■ Sbjct: 388 IQFRTIiLKVKKI 399 

A related DNA seqpaence was identified in S.pyogenes <SEQ ID 6783> which encodes the amino acid 
sequence <SEQ ID 6784>. Analysis of this protein sequence reveals the following: 

Possible site: 58 



> Seems to have a cleavable N-te 


cm signal seq. 










INTEGRAL 


Likelihood = 


10 


14 


Transmembrane 


165 


181 


150 


185 


INTEGRAIi 


Likelihood = 


-7 


43 


Transmembrane 


371 


387 


367 


391 


INTEGRAL 


Likelihood = 




88 


Transmembrane 


90 


106 


86 


109 


INTEGRAL 


Likelihood = 


-3 


35 


Transmembrane 


145 


161 


143 


162 


INTEGRAL 


Likelihood = 


-1 


70 


Transmembrane 


279 


295 


279 


297 




Likelihood = 


-0 


85 


Transmembrane 


209 


225 


209 


226 


INTEGRAL 


Likelihood = 


-0 


27 


Transmembrane 




3 S3 


347 


363 



Final Results 

bacterial membrane — Certainty=0. 5055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA07432 GB:AJ007367 multi-drug resistance efflux pun?) 
[Streptococcus pneumoniae] 
Identities = 236/396 (59%) , Positives = 309/396 (77%) 

VNWRQNLKVAWLGNFPTGASFSLVMPFMALYVENLGTPTELVEYYAGLAVAVTALASflLF 6 0 
+NW+ NL++AW GNF TGAS SLV+PFM ++VENLG ++ V +yAGLA++V+A+++ALF 
INWKimiRIAWFGNFLTCSASlSLVVPFMPIFVEHIjGVGSQQVAFYaGLAISVSAISftALF 63 



lASQ PKE+SG ALGTL+TGV AG L GP +GG +AEL GIR VFLLVG LFL +++T 



CJT NL+P SGL+VS+MGFSS+ S+ +GKLGD+ GIIHRLL+ A YS I+Y 



QLG+ RF +G G GAL+P +N+LL+K+TPK GISRVFA+NQ+F LG V+GP GS VA 



Query: 


1 


Sbjct: 




Query: 


61 


Sbjct: 


64 




121 


Sbjct: 


124 


Query: 


181 


Sbjct: 


241 


Sbjct: 


244 




301 


Sbjct: 


304 


Query: 


361 


Sbjct: 


364 



GY +VFY TSL V + +++LI FR +KVK+I 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 262/373 (70%) , Positives = 314/373 (83%) 

Query: 1 MPF^WLYVEQI^PSNKVEWYAGLSVSLSALSSaLVAPLWGRIADKYGRKPMMVRAGL^lM 60 

MPFM LYVE LG P+ VE+YAGL+V+++AL+SAL AP+WG+LAD+YGRKPMM+RA +M 
Sbjct: 25 MPFMALYVEIsrciGTPTELVEYYAGIAVAVTaiASALFAPVWGKLaDRYGRKPMffl^^ 84 



Query: 61 TFIMGGIAFIHSVTGIiLILRItNGIFAGYVPNSTALIAS(2^QEESGYALGTrATGVTGG 120 
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TFTMC3GIA I +V LLILR+L G+ AGYVPN+TALlASQJ^+EESGXaLGTIATOVT G 
Sbjct: 85 TFT^ra3La,IIE>HVFV&LILRLLTGVSAGYVPNATMlIASQ6PKEESGYA]OTLATO 144 

Query: 121 MLlGPLIKKSLIJOTEGIREVFLLVGTILLISTLKriFMVKEDFKSISSffi 180 

LIGPLLGG+LAE GIR+VFLLVG IL + +LMT VKE+FKP+ E +Pr + K 
Sbjct: 145 ADIGPLLGGIIJffiLLGlRQ^WLLVGVILFLCSIWTAVmCEEFKPVl^FEMIFrKVIL^^ 204 

Query: 181 VKSLaiLIGLPVTSMIIQISaQSIAPILTLYIRHLGQTENLMFVSGI.lVSGMGFSSILSS 240 

VKS. QI++GLPVTSMIIQ1SAQS+APIL+LYXRHLGQT NLMF SGL+VS MGFSS+ SS 
Sbjat: 205 VKSPQI^!lr<3I,PVTSMIIQISAQSVAPILSLYlIa^IX3QTH^nJ^FTSGLWSftMGFSSLFSS 264 

Query: 241 PKLGRIGnRIGNHRLLLIALLYSFLMYVLCSLAQTSr.QLGVIRPLYGFGTGALMPSINSI 300 

LG++GDR GNHRLLL AL YSF+MY +J,RQTS QLGV+RP YGFG GALMPSINS+ 
Sbjct: 265 SYLGKLGDRFGNHRLLLAALCYSFIMYFSSALAQTSFQLC-VLRFAYGFGVGALMPSIKfSL 324 

Query: 301 LTKIAPRQGLSRIFSYKQMFSNLGQVLGPFVGSAVSIHLC-FRVJVFFVTSFIVLANFVWCF 360 

LTK+ P++G+SR+F+Y1SIQMFSNLGQV+GPF+GS V++ LG+R VF+VTS IV N +W 
Sbjct: 325 LTKLTPKEGISRVFAYNQMFSNLGQVIGPFIGSWAVVLGYRSVFYVTSLIVFVNLIWSL 384 

Query: 361 INFRKYIRVKEIV 373 

I FRKYI+VK+IV 
Sbjct: 385 IIFRKYIKVKDIV 397 

Based on this analysis, it was predicted that these proteins and their epitopes coiad be useful antigens for 
vaccines or diagnostics. 

Example 2197 

A DNA sequence (GBSx2315) was identified in S.agalactiae <SEQ ID 6785> which encodes the amino 
acid sequence <SEQ ID 6786>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty^O. 2343 (Affirmative) < suco 
bacterial membrane — Certainty!=o. 0000 (Not Clear) < suco 
bacterial outside — - Certainty=0. 0000 (Not Clear) <: suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaB69986 GB:U34356 glycerol kinase [Enterococcus faecalis] 
Identities = 156/186 (83%) , Positives = 167/185 (88%) , Gaps = 1/186 (0%) 

Query: 3 SEEKYIMAIDQGTTSSRAIIFNKKGEKIASSQKEFPQIFPQAGWVEHWANQIWNSVQSVI 62 ' 

+EEKYIMAIDQGTTSSRAIIF+KKG KI SSQKEF Q FP AGWVEHMRN+IWNSVQSVI 
Sbjct: 2 AEEKYIMAIDQGTTSSRAIIFDKKGNKIGSSQKEFTQYFHS&GWVEHMSNEIWNSVQSVI 61 

Query: 63 AGAPIESSIKPGQIEAIGITNQRETTWWDKKTGLPIYNAIVWQSRQTAPIADQLRQEGH 122 

AG+ IBS +KP I IGITWQREITWWDK TGLPIYNAIVWQSRQT PIAriQI.K++G+ 
Sbjct: 62 AGSLIESQVKPTDIAGIGITNQRETTVVWDKaT6LPIYMaiVWQSRQTTPlADQI.KEDGy 121 

Query: 123 TimiHEKTGLVinAYFaATKVRWIIiDHVPGAQERAERGELLPGTIDTOi:iVWKLTD6t.VHV 182 

+ MIHEKTGL+IDAYFSATKVRWILDHV GAQERAE GEL+FGTIDTWLVWKLT G HV 
Sbjct: 122 SEMIHEKTGLIIDAYFSATiWRWIIJ3HVEGaQERAENGEMreTIDTWl,VWKl,T-GOT 180 
Query: 183 TDYSNA 188 

Sbjct: 181 TDYSNA 186 

There is also high homology to SEQ ID 2844: 

Identities = 174/186 (93%), Positives = 182/186 (97%) 

Query; 3 SEEKYIMAIDQGTTSSRAIIFNKICGEKIASSQKEFPQIFPQAGWWEHNaNQIWNSVQSVI 62 
S+EKyiMMDQGTTSSRAIIEM+KGEK++SSQKEPPQIFP AGWVEHNaNQMNSVQSVI 
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Sbjct: 2 SQEKYimiDQGTTSSRAIIFNQKGEKVSSSQKEFPQIFPHAGlWEHNaNQIWKSVQSVI 61 

Query: 63 AGAFIESSIKPGQIEAIGITNQRETTVVWDKlCrGi:.PIYNAIVWQSRQTaPIADQLKQEGH 122 

flCSAFIESSIKP QIEAIGITNQRETTVVWDKKTG+PIYNAIWQSRQTRPIA+QLKQ+C3H 
Sbjct: 62 AGRFIESSIKPSQIEAIGITtCPETTVVWDKKTGVPI'SmiWQSRCffAPIAEQL^^ 121 

Query: 123 TIMIHEKTGLVIDAYFSATKVRWIIODHVPGAQERAEKGELLFGTIDTWLVWKLTDGriVHV 182 

T MIHEKTGLVIDAYFSATK+RWIUJHVPGAQERAEKGELLFGTIDTWLVWKLTDG VHV 
Sbjct: 122 TKMIHEKTGLVIiaYFSATKIRWIIiDHVPGaQERTfflKBELLFGTIim^LWKLTDGAVH^ 181 

Query: 183 TDYSNA 188 

TDYSNA 
Sbjct: 182 TDYSNA 187 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2198 

A DNA sequence (GBSx2317) was identified in S.agalactiae <SEQ ID 6787> which encodes the amino 
acid sequence <SEQ ID 6788>. This protein is predicted to be glycyl-tEUSfA synthetase beta chain (glyS). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — CertaintY=C .2933 (Affirmative) < suco 

bacterial membrane --- Certainty=C . OOOO (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 {Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CM14468 GB:Z99117 glycyl-tRNA synthetase (beta sxjbunit) 
[Bacillus subtilis] 
Identities = 315/587 (45%) , Positives = 447/687 (64%) , Gaps = 21/687 (3%) 

Query: 3 KDIiLLEIiGDEEBPAYVVTPSEKQIfiQKMVKIT.EDHRLSFETVQIFSTERRLAVRVKHr^ 62 

+DLLLE+GLEE+PA + S QLG K+ +L++ ++ V++F+TPRRLAV VK +A+ 
Sbjct: 4 QDIiLEIGLEEMPARFUffiSMVQICDKLTGWLKEKNITHGiOTa:.ElSlTE^^ 63 

Query: 63 QQTDLTEDFKGPSKKIALDAEGKFSKAAQGFVRGKBLSVDDIEFREVKBEEYVYVTKHET 122 

+Q D+ E+ KGP+KKIALnA+GN++K&A GP +G+G +V+D+ +EVKB EYV+V K + 
Sbjct: 64 KQDDIKEEAKGPAKKIALDAIXSljm'Ka&IGFSKiGQGBUmiDLYIKEVTO 123 

Query: 123 GKSAIDVIASVTEVLTELTFPW^ME^WamSFEYIRPVHTLVVIiDDQ^iLEIJ^PLDIH^ 182 

G+ +L ++ ++T L PP NM W N YIRP+ +V L + 4-4- SGR 

Sbjct: 124 GQETKSLLPELSGLITSLHFPKNMRWGNEDLRYIRPIKWIVJUjFGQDVIPFSITNVESGR 183 

Query: 183 ISRGHRFLGSDTEISSASSYEDDLRQQFVIADAKERQQMIVNQIHAIEEKKNISVEIDED 242 

-^-^aHRFLG + I S S+YE+ L+ Q VIAD R+QMI +Q+ + + N S+ +DED 
Sbjct: 184 TTQGHRFLGHEVSlESPSAYKEQLKGQHVIADPSVRKQMIQSQIiETMaAENNWSIPVDED 243 

Query: 243 LLNEVLNLVEYPTAFLGSFDEKYLDVPEEVLVTSMKNHQRYFWRDRDGICLLPNFISVRN 302 

LL+EV +IjVEYPTA GSF+ -^•^L +PEEVLVT-t-MK HQRYF V■)-D-^+G LLP-fFI+VRN 
Sbjct: 244 LLDEVNHLVEYPTALYGSFESEFLSIPEE-'n.VTTMKEHQRYFPVKDKNGDLLPHFITVRN 303 

Query: 303 GNAEHIENVIKGNEKVLVARLEDGEFFWQEDQKLNIADLVEKLKQVTFHEKIGSLYEHMD 362 

GN•^ lENV +GNEKVL ARL D FF■^+EDQKIJ^1I V-I-KL+ + FHE-H-GSL + ^■ 
Sbjct: 304 GNSHAIENVftRGIffiKVIJUmSDASFFYKEDQKIJSIinaNVKKLENIVF^ 363 

Query: 363 RVKVISQYLftEKRDLSDEEKIAVIiRAASIYKFDIiTCaSlVDEFDELQGIMGEKYALIJV^ 422 

RV I++ lA + ++ V RAA I KFDL+T M+ EP ELQGIMGEKYA 4- GE 
sbjct: 364 RVTSXAEKIAWLQADEDTLKHVKRaaEISKFDLVTHMIYEFPELQGIMGEKYARMLGED 423 
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Sbjct: 424 ElWi^VNHtroiPRSAGGETPSTFTGAWAiyiaDKIiDTIASFFSIGVIP^^ 483 

Query: 483 TQGIWILEaFGWDIPIJJELVTNIiYGLSP'ASIJJYOTQKEVMaFISflRIEKMIGS^ 541 

GIV ir, W I +BL+T F D N E++ F + R++ ++ + ++ D 

Sbjct: 484 ASGIVAIIiLDKMWGISFEELLT FVOIEKEN- -EliDFFTQRLKYVimEQIRHD 535 

Query: 542 IREAVLESDTYIVSLILBASQALVQKSKEAQYKVSVESLSRAFIinijREKVTHSV^ 601 

+ +AVLES L +Q L QK +K + E+L R ++++K + LF 

Sbjct: 536 VIDAVLESSELEPYSALHKAQVLEQKLGaPGFKETAEaLGRVISISKKGVRGD-IQPDLP 594 

Query: 602 ENNQEKALYQAILSLELTEDMHDNLDK LFJiSPIINDFFDNTMVMTDDEKM 652 

EN K L+ A + + E++ +N K L AL 1+ +FD+TMV+ D+E 4 

Sbjct: 595 ENEYK^KLFDAYQTAK--ENIOENFSKKDYEaaLRSIAALKEPimYFDHTMVIADKIESL 652 

Query: 653 KONRLAILNSLVAKARTVAAENLIOSITK 679 

K NRLA + SL + ++ A N L K 
Sbjct: 653 KANRLAQMVSLADEIKSFANMNALIVK 679 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2835> which encodes the amino acid 
sequence <SEQ ID 2836>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>">> Seems to have no N-terminal aignal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 450 - 466 ( 450 - 466) 

Final Results 

bacterial membrane — Certainty=0. 1383 (Affirmative) < euco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . OOOO (Not Clear) <: suco 

• An alignment of the GAS and GBS proteins is shown below. 

Identities = 505/679 (74%) , Positives = 578/679 (84%) 

Query: 1 MTKDLLLELGLEELPAYWTPSEKQLGQKMVKFLEDHRLSFETVQIFSTPRRLAVRVKGL 60 

M+K+LL+ELGLEELPAYVVTPSEKQLQ+++ PL ++RLSFE +Q FSTPRRLAVRV GL 
Sbjct: 1 MSKNLLIELGLEELPAYWTPSEKQLGERLATFLTENRLSFEDIQTFSTPRRLAVRVSGL 60 

Query; 61 ADQQTDLTEDFKGPSKKIALDAEGNFSKAAQGPVRGKGLSVDDIEPREVRGEEYVYVTKH 120 

ADQQTDLTEDFKGP+KKIALDA+GNFSKAAQGFVRGKGL-H D lEFREVKGEEYVYVTKH 
Sbjct: 61 ADQQTDLTEDFKGPAKKIALDADGMFSKAAQGFVRGKGLTTDAIEPRKVKGEEYVYVTKH 120 

Query: 121 ET6KSAIDVLASVTEVLTELTFPVMI1HWANNSPEYIRPVHTLWLLDDQALELDFLDIHS 180 

E 6K A +VL VTEVL+ +TFPV+MHWaNNSFEYIRPVHTL VLL+D+ALELDFLDIHS 
Sbjct: 121 ERGKPAKEVLLGVTEVLSAMTFPVSMHMAlsmSFEYIRPVHTLTVIjaro 180 

Query: 181 GRISRGHRFLGSDTEISSRSSYEDDLRQQFVIJfflAKERQQMIVNQIHAIEEKKNISVEID 240 

GR+SRGHRFLG++T I+SA SYE DLR QFVIADAKERQ+MIV QI +E ++ + V+ID 
Sbjct: 181 GRVSRGHRFLGTOTTITSADSYEADLRSQFVIAnAKERQEMIVEQIKTLEVEQGVQVDID 240 

Query: 241 EDLLNEVI>NLWYPTAFLGSFDEKYLDVPEETOVTSMKNHQRYFVVEDRDGKLLPNFISV 300 

EDLLNEVIjNLVE+PTAF+GSF+ KYLDVE>EEVLVTSMKIiIHQRYFVVRD+ G L+PNF+SV 
Sbjct: 241 EDL):OTVLmVEFPTAFMGSFEAKYLDVPEE\niVTSMraraQRYFVVRDQAGHLMPNFVSV 300 

(Juery: 301 RNGNAEHIEOTIKGIffilK^^VARIiEDGEFFWQEDQKLNIADLVEKtKQVTFHEKIGSLYEH 360 

RNGN + lENVIKGNEKVLVARLEDGEPFW+EDQKL lADLV KL VTFHEKIGSL EH 
Sbjct: 301 RNCaSIDQAIENVIKBNEK^^^ZaRLED6EFFWREDQKLQIADLVaKLTIm^FHEKIGSU 360 

Query: 361 hTORVKVISQYLAEKADLSIBEKLAVLRAASIYKFDLLTGMVDEFDEI^KSIMGEKYRLLaG 420 

MDR +VI+ LA++A+I1S EE AV RAA lYKFDLLTGMV EFDELQGIMGEKYALLAG 
Sbjct: 361 MDRTRVIAflSLAXEamiSaEEOTAVDRAAQIYKFDLLTOMWSBEDSLQGIMGEKKaLLaG 420 

Query: 421 EQPAVaAAIREHVMPTSfilXSELPETRVaRILAL&DKFDTLLSFFSVGLIPSGSNDPYALR 480 

E AVa AIREHY+P +A G LPET+VBA+LALA K DTLLSFFSVGLIPSGSNDPYALR 
Sbjct: 421 EDAAVRTAIREHYLPDAAGGALPETKVGAVLALAAKLDTLLSFFSVGLIPSGSNDPYALR 480 
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Query: 481 RATQGIWILEAFGWDIPLDELVTNLYGLSFASLDYAIIQKEVMAFISARIEKMIGSKVPK 540 

RaTQGIVRIL+ FGW IP+D+LV +LY LSF SL YAN+ +VM FI aR++KM+G PK 
Sbjct: 481 RATQGIWILDHFGWRIPMDKLVDSLYDLSFDSLTYAIIKADVMNFIRARVDKMMGKAAPK 540 

Query: 541 DIREAVLESDTYIVSLILK&SQaLVQKSKDAQYKVSVESLSRAEHLREKVTHSVLVDSSL 600 

DIREA+L S T++V +L A++ALV+ S YK +VESLSRAFNLAEK SV VD SL 
Sbjct: 541 DIREAII^STFVVPEMLaftftEftLVKaSHTEiraCPAVESLSRAEmjAEKAI^ 600 

Query: 601 FEmQE:KALYQAILSLE3^TED^IHDNIlDKLFALSPXIIroFFDMTW^m}DEK^IKQNR]^^ 660 

FEN QE L+ ai L L L+++FAI.SP+INDFFnNTMVM D+ +K NRLAII. 

Sbjct: 601 FENEQENTLFAAIQGLTI^SAflQQLEQVFALSPVTNDFFimMVmSDQALKISn^^ 660 

Query: 661 NSLVaKARTVftAFNLENTK 679 

+ LV+KA+T+ AFN IMTK 
Sbjct: 661 SDLVSKAKTIVAFNQLNTK 679 

Based on this analysis, it was predicted that these proteins and their epitopes could he useM antigens for 
vaccines or diagnostics. 

Example 2199 

A DNA sequence (GBSx2318) was identified in S.agalactiae <SEQ ID 6789> which encodes the amino 
acid sequence <SEQ ID 6790>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.2182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < auoo 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaD24436 GB:AF112858 NaD(P)H dehydrogenase [Bacillus 
stearothermophilus] 
Identities = 64/174 (36%) , Positives = 98/174 (55%) , Gaps = 6/174 (3%) 



Sbji 

Sbjct 
Query: 



NTLIWSHPDFSNPYSFTTILQEK! 
N L + +HP + S++ + + I 
NVIYITAHPH -DDTQSYSMAVGIQ\: 



'lELYNEHFPNHQLSILNLYDCVLPEITKEVLLSIW 61 
':+ Y + P+H++ L+LY +PEI +V S W 
':DTYKQVHPDH3VIHLDLYKEYIPEIDVDVF-SGW 60 



62 SKQRKGL---ELTADEIVQAKISICDLLEQFKSHHRIVF\rSPMHNYNVTARAKTyiDNIFI 118 

K R G EL+ +E + +L EQF S + VFV+PM N++ K YID + + 

61 GKlRSGKSFEELSDEEKAK^GRMSlELCEQFISaDKYVFVTPMWNFSFPPVLKAYinAVAV 120 



: 119 AGETFKYTENGSVGLMTDDYRLrMLESaGSIYSKGQYSPYEFPVHYlKAIFKDF 172 
AG+TFKYTK G VGL+TD + L +++ G YS+G + E YL I + F 
Sbjct: 121 AGKTFKYTEQGPVGLLTDK-KaLHIQftRGGFYSEGPAAEMEMGHRYLSVIMQFP 173 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 

Example 2200 

A DNA sequence (GBSx2319) was identified in S.agalactiae <SEQ ID 6791> which encodes the amino 
acid sequence <SEQ ID 6792>. This protein is predicted to be glycyl-tRNA synthetase (glyQ). Analysis of 
this protein sequence reveals the following: 
55 Possible site: 56 
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>>> Seems to have no N- terminal signal sequence 

; : Final Results 

bacterial cytoplasm Certainty=0. 1364 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9521> which encodes amino acid sequence <SEQ ID 9522> 
was also identified. 

1 0 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05089 GB:AP001511 glycyl-tRNA synthetase (alpha subunit) 







[Bacillus halodurans] 




Identitie 


3 = 222/287 (77%) , Positives = 250/287 (86%) 






6 


LTFQEIILTLQQFWNDQGCMUyiQAynNEKGRGTMSPYTFI^ 


65 






+ Q +ILTLQ++W+ Q C+L+QAYD ERGAGTMSPYT LR IGPEPWN AYVEPSRRPA 




Sbjct: 


1 


MNVQlMILTLQEWSKGNCILLQAyDTERGAGTMSPYT^ 


60 


Query: 


66 


DGRYGENENRLYQHHQFQVVMKPSPSNIQELYLKSLELLGIOTLEHDIRFVEn^^ 


125 






DQRYGENPNRLYQHHQFQV+MKPSP+NIQELYIi SIi LSINPLEHDIRFVEDNWENPS 




Sbjct: 


61 


DGRYGENEimjYQEfflQFQVIMKPSPTNIQELYLDSIiRflLGIireLEHDIRFVEDOT^ 


120 




126 


GSAGICWEVWLDGIffilTQFTyFQQVGGLQTGPVTSEVTyGIiERLRSYIQEVDSVYDIEWA 


185 






G ftGLGWEVWLDGMEITQFTYFQQVGGL+ PV++E+TYGIiERIiRSYIQ+ ++V+D+EW 




Sbjct: 


121 


GCMLGWEVWLDG^mITQFTYFQQUI3GLEANPVSAEITYG]^RIJ^YIQDKEWFDIlEW 


180 


Query: 


186 


PGVKyGEIFTQPEYEHSKYSFEISDQVM:iLENFEKFEREAKRai.EEGLVHPAYDYVLKCS 


245 






G YG+IFTQPEYEHSKY+FE+SD ML E F +E-1-EA RALEE LV PAYDYVLKCS 




Sbjct: 


181 


EGBTYGDIPTQPEYEHSKYTPEVSDSaMLFELFSTYEKEanRALEENLVFPAYDYVLKCS 


240 




246 


HTFNLLDAEGAVSVTERAlGYIARIRNIARVVAKTFVAERKiajGFPLL 292 








HTFNLLDARGA+SVTER GYI R+RNLSR AK + ER+KIiGFP+L 




Sbjct: 


241 


HTENLLnARGAISVTERTGYIGRVRNLARKCaUCKyYEEREKLGFPML 287 





35 A related DNA sequence was identified in S.pyogenes <SEQ ID 6793> which encodes the amino acid 
sequence <SEQ ID 6794>. Analysis of this protein sequence reveals the following: 
Possible site: 55 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 2081 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 290/304 (95%), Positives = 294/304 (96%) 



Query: 


2 


MSKKLTFQEIILTLQQFWHDQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPMISIAAyVEPS 


61 






MSKKLTFQEIILTLQQ+^'raDQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPWNaAYVEPS 




Sbjct: 


1 


MSKKLTFQEIILTLQQYWMjQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPWNAAYVEPS 


50 




62 


RRPADGRYGENPNRLYQHHQFQVVMKPSPSNIQELYLKSLELLGINPLEHDIRFVEDNWE 








RRPflDGRYGENPNRLYQHHQFQWMKESPSNIQELYL SLE LGINPLEHDIRFVEDNWE 




Sbjct: 


61 


RREADGRYGENPNKLYQHHQFQVVMKPSPSNIQELYLASIjEKLGINPLEHDIRFVEDNWE 


120 




122 NPSTGSAGLGWEVWLDGMEITQFTYFQQVGGLQTGPVTSEVTYGLERLASYIQEVDSVYD 


181 






NPSTGSAGLGWEVWLDGMEITQFTYFQQVGGL T PVT+EVTYGLERLASYIQEVDSVYD 




Sbjct: 


121 NPSTGSAGIXSMEVimJGMEITQFTYFQQVGGLATSPVTAEVTyGLERIiASYIQEVDSVYD 


180 



60 Query: 182 lEWAPGVKYQEIFTQPEYEHSKYSFEISDQVMLLENFEKEEREAKRALEEGLVHPAYDYV 241 

lEMAPGVKYGEIF QPEYEHSKYSPEISDQ MLLENFEKFE+EA RALEEGLVHPAYDYV 
Sbjct: 181 lEWAPGVKYGEIFLQPEYEHSKYSPEISDQDMLLENFEKFEKEASRALEEGLVHPAYDYV 240 
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Query: 242 LKCSHTFNLLDARCaVSWEIUiGYIMlIRraiARWAKTFTMRKKIBFPLLDEETRI 301 

LKCSHTEOTjBDftEGRVSVTERflGyiiU?IRHLaRVVaKTFVaERKKI^^^^ TR Lh 
Sbjct: 241 LKCSHTEmjmARGAVSVTERAGYIJ^IRjn^WflKTFTOffiRKKLCSFPLLDEATR^ 300 

Query: 302 ABED 305 
AE+D 

Sbjct: 301 AEDD 304 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2201 

A DNA sequence (GBSx2320) was identified hi S.agalactiae <SEQ ID 6795> which encodes the amnio 
acid sequence <SEQ ID 6796>. Hiis protein is predicted to be vacB protein (vacB). Analysis of this protein 
sequence reveals the following: 

Possible site: 60 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0.2966 (Affizmative) < suco 
bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 
bacterial outside — CertaintyO . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9399> which encodes amino acid sequence <SEQ ID 9400> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 


4 


Sbjct: 


36 






Sbjct: 


88 




120 


Sbjct: 


145 




178 


Sbjct: 


199 




238 


Sbjct: 


259 




298 


Sbjct: 


319 




358 


Sbjct: 


379 



SIDQDEDDMFIGKiroiAYAIDGDTVEAVVKKPflDRIJJGTAAEftRVVNIVERSLKTLVGKF 119 

D D+FI N++ A++GD V + + +G+ E V+ I+ER+++ +VG + 
PEDTSLSDVFIPPNELNTAMNGDIVMVRLNSQS---SGSRQEGTVIRILERAIQRWGTY 144 



I+GH+ D GID+L V+ + EFP D + +A++ PD EKDL R DLR +V T 



IDCSADAKDIiDDAV + LD+G ++LGVHIADVS+YVTE S +++EAL RGTSVY+ DRV+ 



PM+P RLSNGICSLNP +DRLT SC M 1+ C3+V H+I QSVI TT RMTY+ ^ 



Query: 417 RNRGIAERMIESPMLAANETVAEHYARLKLPFITOIHEEPKAEKLQKFIDYASVFGVQIQ 476 
R R +AE++IE PML ANETVAEH+ + +EFIYRIHEEP AEKLQKF+++ + pG ++ 
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Sbjct: 439 RERSVAEKLIEEFMLVaWETVaEHFHWMNVPFIYRIHEEPNAEKLQKFLEFVTTFGYVVK 498 

Query: 477 GTATKITQSALQDFMKKVQGQPGSEVLSMMLLRSMQQJffiYSEHNHGHyGLaaEYYTHFTS 536 

GTA I ALQ + V+ +P . V+S ++LRSM+Qa+y + GH+GL+ E+YTHFTS 
Sbjct: 499 GTAGNIHPRALQSILDAVRDRPEETVISTVMLRSMRQAKYDPQSLGHFGLSTEFYTHPTS 558 

Query: 537 PIRRYPDLLVHRMIRDy-DDKB(CIKA--DHFANLIPEIATQTSSi:iERRAIDAERIVEaMK 593 

PIRRYPDL+VHR+IR Y + +D+A + +A +P+IA TSS+ERRA+DAER + +K 
Sbjct: 559 PIllRYPDLIVHRLlRTYLINGKVDEaTQEKWaERLPDIAEHTSSMERRaVDAERETDDLK 618 

Query: 594 KAEYMEEYVGEEFEGVVASVVKFGMFVELPNTIEGLIHVTrL-PEYYHENERTLTLQGEK 652 
KAEYM + +GEEF+G+++SV PGMFVELEtJTIEGL+HV+ + +YY F+E+ + GE+ 

Query: 653 SGKVFRVGQQIKVKLIRSDKETQDIDFDYIPSDFDIVEKVSKSSREGRPNRSSKREHQHR 712 

+G VPR+G +1 VK++ +K+ +IDF+ + +G P R + + 
Sbjct: 679 TGNVFRIGDEITVKWDVNKDERNIDFEIV GMRSTPRRPREUD 721 

Query: 713 ISDRDNIQIKin'SKKEASRKPKRNSDSKS 
S R K ++K+ 

Sbjct: 722 -SSRSRKRGKPARKRVQSTOTPVSPAPS-EEKGEWETKPKKKKraCRGFQimPKQiaUCK^ 779 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6797> which encodes the amino acid 
sequence <SEQ ID 6798>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=Q . 0811 (Affirmative) < suco 

bacterial membrane — Certainty=c . oooo (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 579/773 (74%) , Positives = 664/773 (84%) , Gaps = 22/773 (2%) 



MAGAK FP LIKTIS +ES LRF D+GSL+L+K+ KKKE W+G+FRANKAGFGFIi 
Sbjct: 27 MAGAKHFPSLIKTISKMESQSLtRFSDDGSLaLRKEREKKKEPTVQGVFRAMKAGFGFLH 86 

Query: 61 IDQDEDDMFIGKMDIAYArDGDTVEAWKKPADRLNGTAAEARWNIVERSLIO'LVGKFV 120 

+D++EDDMFIG+ND+ YAIDGDTVE \fVKKPADRIj C-TAAEA+W IV+RSLKT VG F+ 
Sbjct: 87 VDENEDDMFIGRNDVGYAIDGDTVEVWKKPADRLKGTAAEAKWAIVDRSLKTAVGTFI 146 

Query: 121 LDDERPKYAGYIKSKNQKINQKIYIRKEPVVI£lGTEIIiCVDIDKyPTRGHDYFVASVRDI 180 

LDD++PKSAGYI+SKNQKI QKIYI+KEPWL GTEIIKVDIDKYP RGHDYFVASVRDI 
Sbjct: 147 II)DDKPKYAGYIRSKNQKIQQKIYIKKEPVVLKGTEIIKVDIDKyPIRGHDYFVaSVRDI 206 

Query: 181 VGHQGDVQIDVLEVLESMDIVSEFPEDVXAEMStRIPDaPTEKDLIGRVDLRQEVTFTIDG 240 

VGHQGDVGIDVLEVLESMDIVSEFP +V+AEANAI +APT KDLIGRVDLRQE T TIDG 
Sbjct: 207 VQHQQDVaIDVLEVLES^roIVSEPPAEVIi2ffl2mISEaPT^Ua3LIGKVDIalQETT^ 266 

Query: 241 .ADAKDLDiaVHIKLII)NGHFEIGVHIJUDVSYYVTEGSMjmEALSRGTSVYVTDRW 300 

ADAKDLDDA+HIKLLDNG++ELGVHIADVSYYVTEGSAL++EA++RGTSVYVTDRVVPML 
Sbjct: 267 ADAKDLDDAIHIKLLDNGNYELGVHIADVSYYVTEGSALDKEAIARGTSVYVTDRWPML 326 

Query: 301 PERLSNGICSLNPNLDRLTQSCIMEIDQNGRWNHQITQSVINTTYRMTYTAVNDIIAGD 360 

PERLSNGICSIiNPN+DRLTQS +MEI+ G WN+QI QSVI TXYRMTY+ VKD+IAGD 
Sbjct: 327 PERLSNGICSLNPWIDRLTQSALMEINSQGHVVNYQICQSVIKTTYRMTYSTVWDMIAGD 386 

Query: 361 EEICSEYESIVSSVQHhOTTiHHTimmTRRGaLNFDTSEAKIMVNDKGMPVDIVIRNRG 420 

EE E+ SI V MV LH LEaMR++RGArJSIFDT EAKI+VNDKGMPVD+V+R RG 
Sbjct: 387 EEALQEFASIADDVTIJ4VM,miI.EaMRSKRGALOTDTQEAKlIVNDKGMPVDVV^ 446 



Query: 421 lAERMIESFMIAMffiTWREHYARLKIiPFIYRIHEEPKAEKLQKFIDYaSVFGVQIQGTAT 480 
65 lAERMIESFMLAANE VAEH+A+ KLPFIYRIHEEPKAEKIO+PIDYAS FG+ IQGTA 
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Sbj'ct: 447 lAERMIESFMLflfllffiCTJffiHFiUCaKLPFrYRIHEEPKAEKLQQFIDYASTFGIHIQGTAN 505 

Query: 481 KITQSALQDFMKKVQGQPGSEVLS^mLLRSMCQARYSEHNHGHYGLAAEYYTHFTSPIRR 54 0 

KI+Q ALQ FM KV+GQPG+EVL+MMLI^RSMCQARYSEHlfflGHyGLAAEYYTHFTSPIRR 
Sbjct: 507 KISQEMiQAFMAKVEGQPQAEVLlMILLRSMCQiy^YSEHimGI^LJ^AErrTHFTS 566 

Query: 541 YPDLLVHRMIRDyDDK2M3KMHFAl^JIPEIATQTSSLEERAIDAERIVEAMKKAEY^^ 600 

YPDriLVHRM+R+y+ + +K DHFA +IPE+AT +S LERRAIDaER+VEAMKiaVEYM E 
Sbjct: 567 YPDLL\raRMVEEYNQPSQEKRDHFAQIIPELATSSSQIiERRAinRERVVEai^^ 626 

Query: 601 YVGEEFEGWASVVKFGMFVELPNTIEGLIHVTTLPEYYHEimRTIjTLQGEKSGKOTRVG 660 

YVGEEF+G+V+SWKFG FVELENTIEGL+H+T+LPEYYHFNERTL+LQGEKSGKVF+VG 
Sbjct: 627 YVGEEFIX3IVSSVVTCPGFPVELENTIEGLVHITSLPEYYHFNERTLSLQGEKSGKVFKVG 686 

Query: 661 QQIKVKLIRSDKETGDIDPDYIiPSDPDIWKVSKSSREGRPNRSSKREHQHRISDRDNKN 720 

Q I+VKL+++DKETGDIDP+YLPSDPD+VEK+ S + R +R K+ 
Sbjct: 687 QPIRVKIjVKfiDKETGDIDFEXLPSDPDWEKlKMSDK&SRRDR — RKS 732 

Query: 721 K 

+SK ++PK + +K 

Sbjct: 733 SKSSKBTKKKEPKEVaKaK TKBKTKKGSKKPFYKEQaKKKSRKRS 777 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2202 

A DNA sequence (GBSx2321) was identified in S.agalactiae <SEQ ID 6799> which encodes the amino 
acid sequence <SEQ ID 6800>. This protein is predicted to be VacB homolog (smpB). Analysis of this 
protein sequence reveals the following: 

0 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Csrtainty=0 . 2988 (Affirmative) < suae 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MVKGQGmAaQNKKJ^HHDYTIVETIEaGIVLTGTEIKSVRAARITLKDGYAQIKNGEAWL 60 

M KG+G WAQNKKA HDYTIV+T+EAG+VLTGTEIKSVRAARI LKDG+AQ+KNGE WL 
Sbjct: 1 MAKGEGKWAQNKKARHDYTIVDTLEAGMVLTGTErKSVRAARINLKDGFAQVEOilGEVWL 60 

Query: 61 INraITPYI)QGNIMNQDPDRTRKLMJKIaffiIEKISlffiLK6TG^raJVPIJCVYU 120 

NVHI PY++GNIWNQ+P+R RKLLL K++I+K+ E KBTGMILVPLKVY+KDG+AK+L 
Sbjct: 61 SNVHIAPYEEGNIWNQEPERERKLIiHKKQIQKLEQETK!GTG^ra.VPLKVYIKDGYAm 120 

Query: 121 IXSIAKGKfflJYDKRESIKRREQNRDIARCJEiKNYNSR 155 

LGIjRKGKHDYDKRESIKRREQNRDIflR +K N R 
Sbjct: 121 LGLRKGKHDyDKRESIKRREQNRDIARVMKAVNQR 155 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6801> which encodes the amino acid 
sequence <SEQ ID 6802>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

- Certainty=0. 2918 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaiiity=0 .0000 (Nob Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/155 (80%), Positives = 145/155 (93%) 

Query: 1 ^mCGQ(a!JVVaQNKKAHffl)YTIVETIEaGIVLTGTEIKSVEa^ 60 

M KO+Q+++AGNKKA HDY IVET+EMIVLTOTEIKSVEAARI LKDG+AQIKNGEaVIL 
Sbjct: 1 MT^GEGHILAQNKKRRHDraiVETVERGIVLTGTEIKSVRAftRIQIiKTCFAQIKNGEAW^ 60 

Query: 61 INVHITPYDQGNIWNQDPDRTRKLLLKKREIEKISNELKGT6MTLVPLKVYLKDGFAKVL 120 

+WVHI P++QGKIWN DP+RTRJCLLLKKREI ++NEIiKiG+GMTLVPLKVYLKDGFAKVL 
Sbjct: 61 VWVHIAPFEQGNIVmADPERTRKLLLKKREITHUMffiLKGSGOTLVPLKVYIiKDGFAKOTi 120 

Query: 121 LGLAKGKHDYDKRESIKRREQNRDIARQLKNYNSR 155 

+GLAKGKH+YDKRE+IKRR+Q RDI +Q+K+YN+R 
Sbjct: 121 IGLAKGKHEYDKRETIKRRDQERDIKKQMKHYNAR 155 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2203 

A DNA sequence (GBSx2322) was identified in S.agalactiae <SEQ ID 6803> which encodes the amino 
acid sequence <SEQ ID 6804>. Analysis of this proteui sequence reveals the following: 

Possible site: 14 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty4=0. 6876 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on lius analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

' Example 2204 

A DNA sequence (GBSx2323) was identified in S.agalactiae <SEQ ID 6805> which encodes the amino 
acid sequence <SEQ ID 6806>. This protein is predicted to be d-serine/d-alanine/glycine transporter (cycA). 
Analysis of this protein sequence reveals flie following: 



Possible site: 55 
>>> Seems to have a cleavable 
INTEGRAL Likelihood = -S 



INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



term signal seq. 

2 Transmembrane 
12 Transmembrane 

3 Transmembrane 
10 Transmembrane 



32 



Transmembrane 117 - 



- Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



87 ( 62 - 

336 ( 315 - 

270 ( 251 - 

174 ( 154 - 

213 ( 196 - 

133 ( 116 - 

298 ( 279 - 

358 ( 342 - 



-- Certainty=0. 4609 (Affirmative) . 
■- Certainty=0. 0000 (Not Clear) < i 
■- Certainty=0. 0000 (Not Clear) < i 
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A related GBS nucleic acid sequence <SEQ ID 9397> which encodes amino acid sequence <SEQ ID 9398> 
, was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14651 GB:299117 amino acid permease [Bacillus subtilis] 
Identities = 165/361 (45%), Positives = 227/361 (62%), Gaps.= 17/361 (4%) 

Query: 1 MGlFLT-]^YWlSLIFIGMftEITaVBEYVQFWFPEWPSWIIQrVFM.ILSSINLIAVKAF 59 

M F+T +YW 1 + M&++IHVG Y (Jf W P+ p W+ ++ L IL +ML VK F 
Sbjct: 95 MaAFITGWTYWFCWISLAMMJLTAVGIYTQXWLEWPQWLPGIiALlILLIMN^ 154 

Query: 60 GKTEFWFMIKVIAIMLlATGIFMVLTNFDTGHGYHaSISHITHHFEWFPKGra^ 119 

GE EFWFA+IKVIAIL LI TGI ++ F G AS++N+ +H PP G F ++ 
Sbjct: 155 GElBFWFALIKVIAIIALIVTGILLlAKBFSflJ^G-PASLNmWSHGGlV^ 213 

Query: 120 FQMVFFAYLRIEFVGVTTSETfiTORKVLPKRIQEIPMRIILFYflGSrj^ 179 

FQMV PA++ IE VG+T ET NP+KV+PKAI +IP+RI+LFy G+L IM I+PW L 
Sbjct: 214 FQMVVFAFVGIELVGLTAGETENPQKVIPKAINQIPVRILLFYVBALFVIMCIYPV^^ 273 

Query: 180 V]ffiSPFVTVFKLaaiKWAAALINFVVLTSAASniJaSTLYSTGRHLFQLaiffi--SPNaLTK 237 

NESPFV VF GI AA+LINFWLTSAASA NS L+ST R ++ LA + +P L K 
Sbjct: 274 PNESPFVQVFSAVGIWAASLINFWLTSAASAANSALFSTSRMVYSLAKDHHAPGLLKK 333 

Query: 238 ALKmCJLSEQSVPSRMIAS-AVIVGaSflLISVLPGISDAFSLITASSSGVYISIYVLI 295 

L4- +VPS A+ S A+++G S L ++? F+LIT+ S+ +1 1+ + 

Sbjct: 334 LTSSNVPSNALFFSSIAILIGVS-LN-fLM?--EQVFTLITSVSTICFIFlWGIT 384 

Query: 295 MIAHWKYRKS- -PDFM"EDGYKMPAYKILSPITLLFFLPVF\'-SLPLQDSTYIGAIGATIWII 354 

+ 1 H KYRK+ + + +KMP Y + + +TL F F+ V L L + T I +W + 

Sbjct: 385 VICHLKYRKTRQHEAKANKFKMPFYPLSNYLTLAFLAFILVILALANDTRIALFVTPVWFV 445 

There is also homology to SEQ ID 4070: 

Identities = 286/364 (78%) , Positives = 322/364 (87%) , Gaps = 1/364 (0%) 

Query: 2 ■GIPLTLSYWISLIFIGMAEITAVGEYVQFWFPEWPSWIIQIVFIAILSSINLIAVKAFGE 51 

G P LSYWISLIFIGMAEITAVG YVQPWFP WP+W+IQ+VFL +LSSIWLIAV+ FGE 
SbDCt: 101 GYFSGLSYWISLIFIGMAEITAVGAYVQFWFPSWPAWLIQLVFLVLLSSINMAVRVFGE 160 

Query: 62 TEFWFAMIKVIAILGLIATGIFMVLTNFDTGHGYHASISNITnHFEWFPKGKLNFFMAPQ 121 

TEFWFAMIK++Air, LIAT IFMVLT F+T H HAS+SNI +HF FP GKL FFMAFQ 
Sbjct: 161 TEFWFAMIKILAILALIATAIFMVLTGFET-HTGHASLSNIFDHFSMFENGKLKFFMAFQ 219 

Query: 122 ^WPFAYLAIEFVGVTTSETANPRK'/LPraIQEIPMRIILFYaGSLLAIM^IFPWC3QLPVM 181 

MVFFAY AIEFVG+TTSETANPRKVLPKAIQEIP RI++FY G+L++IMAI PW QLFV+ 
Sbjct: 220 ■WFFAYQAlEFVGITTSEraNPRKVLPKAIQEIPTRIVIFyVGaLVSIMAIVPWHQr.PVD 279 

Query: 182 ESPFVTVFKIAGIKWAAALINFVVLTSAASAMISTLYSTOHHLPQLaNESETOLTKALK^ 241 

ESPFV VFKL GIKWaARLINFVVLTSaASRINSTI.YSTORHL+Q+ANE+EMaLT LK+ 
Sbjct: 280 ESPFVMVFKLIGIKMAAALINFVVLTSAASaiJKTLYSTGRHLYQIftNETPNaLIMRLKI 339 

Query: 242 DQLSRQSVPSRAIIASAVIVG&SaLISVLPGISDAFSLITASSSGVYISIYVLIMIAHWK 301 

+ LSRQ VPSRAIIASAV+VG SALH-+I,PG++nAFSLITASSSGVYI+IY L MIAHWK 
Sbjct: 340 NTLSRQGVPSRAIIASAWVGISALI1JII.PGVADAFSLITASSSGVYIAIYALTM1AHWK 399 

Query: 302 ■XRKSPDFMEDGYKMPAYKlLSPirLX.FFLFVPVSI.FLQDSTYIGAIGAriWIIGreLYSH 361 

YR+S DFM DGY MP YK+ +P+TL FF FVF+SLFLQ+STYIGAIQ&TIWII FG+YS+ 
Sbjct: 400 YRQSKDFMADGYLiWKYKm'PI.rLAFFflFVFISLFLQESTYIGAIGATIWIIIFGIYSN 459 
Query: 362 FKHK 365 
K K 

Sbjct: 460 VKFK 463 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines o 



Example 2205 

A DNA sequence (GBSx2324) was identified in S.agalactiae <SEQ ID 6807> which encodes the amino 
5 acid sequence <SEQ ID 6808>. Analysis of this protein sequence reveals the following: 



Possible Bite: 38 



INTEGRAL 
INTEGRAL 

INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have an ■uncleavable N-term signal seq 
Likelihood = -8.33 
Likelihood = 



Likelihood = -5 
Likelihood = -3 
Likelihood = -1 
Likelihood = -0 



43 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



210 ( 191 - : 

33 ( 14 - 

Transmembrane 125 - 141 ( 119 - ] 

Transmembrane 155 - 171 ( 153 - ] 

Transmembrane 96 - 112 ( 94 - ] 

Transmembrane 49- 65 ( 49- 



- Certainty=0. 4333 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95438 GB:AF068901 unknown [Streptococcus pneumoniae] 
Identities = 80/214 (37%) , Positives = 122/214 (56%) , Gaps = 3/214 (1%) 



Query: 


4 


FFSNIRTEIPQMPLLIHSLILSVLPPLMtJLTLVHEDKPLYKTIWSILLGLQLITIYTWFF 


63 






FF+ T+ P+ L + + ++L + R+K +Y+ + IL +QLI +Y W++ 




Sbjct: 


7 


FFTT(3ATKPPKFDLFWYVBLFTLIJVLTFYTAHRyREKKVYQRFPQILQTVQLILLYGWYW 


66 


Query: 


64 


VIAKLPLSESLPLraCRIGMFVVLLARPGI--LKDYFALL6WGGVLaMIHPDFYPYQFLH 


121 






+PLSESLP YHCR+ MFWLL PG K YFALLG G + A ++P Y F H 




Sbjct: 


67 


VtraMPLSESLPFYHCRMAMFVVLLL-PGQSKYKQYFALLGTFGTLaaFVYPVPDAYPFPH 


125 




122 


VTNIFFFIGHFALFVLSLUnjprrQSKn^DKIiNPKLIIQLTLLINMSLIFIlSn^ 


181 






+T + F GH AL SL++L+ Q N L+ K I +T +N + +NL+TGG+YGF+ 




Sbjct: 


126 


ITILSFIFGHIALLGNSLVYLIJlQYNRRLLDVKGIFLMTFALCiaLIFVVlilLVTGGDYGFL 


185 




182 


MKTPILGITNPFLNLFIVTTLLSFLVLFVKQIFQ 215 








K P++G N +V+ +L + K+I + 




Sbjct: 




TKPPLVGDHGLVANYLLVS IVLVATI SLTKKILE 219 





40 A related DNA sequence was identified in S.pyogenes <SEQ ID 6809> which encodes the amino acid 
sequence <SEQ ID 6810>. Analysis of this protein sequence reveals the following: 

Possible site: 3 5 





have no N- terminal signal sequence 










INTEGRAL 


Likelihood =-11.25 Transmeinbrane 


16 


32 


11 


39 


INTEGRAL 


Likelihood = -3.45 Transmembrane 


154 


170 


153 


173 


INTEGRAL 


Likelihood = -3.08 Transmeinbrane 


96 


112 


94 


112 


INTEGRAL 


Likelihood = -1.91 Transmembrane 


191 


207 


191 


209 


INTEGRAL 


Likelihood = -1.12 Transmembrane 




87 


71 


- 87 



50 Pinal Results 

bacterial metifcrane Certainty=0. 5501 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succs 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the databases: 

>GP:AAC95438 C3B:AF068901 unknown [Streptococcus pneumoniae] 
Identities = 90/231 (38%) , Positives = 128/231 (54%) , Gaps = 7/231 (3%) 

Query: 3 FFAIDPIGLPHTSLIFYLSSLLIALLLVFLTFQAYRLKS-HRYFFLFLQLSQVIGLYTWY 61 
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Query: 62 VI.RGFPLDEai.PLYHaiIJaiLAIFFLPDRiroKQrHlVICIGGTFI^L--SPDLYPFBU^ 119 

+ PL E+LP YHCR+AM + LP ++K+KjQ F +LG GT A + PD YPF 
Sbjot: S6 VmimPLSESLPFYHCamAMFVVLLLPGQSKXKQYFMMTFGTLJiaFVYPVPD&YPFP- 124 

Query: 120 WHVMWSFYFGHYMiLWGLIYLLRFYDASQLRLLSVVRYLATraFLLLLVSLATK^ 179 

H+ 4-SF F6H ALL N LJ-YLLR Y+A L + + +N L+ +V+L T G+YG 

Sbjct: 12S -HITILSFIFGHLALLGNSLVYLLRQYNARLLDVKGIFLMTPALNALIFVVNLVTGGDYG 183 

Query: 180 FVMDIPVIHTRHLLMPVIVTSGLTFMVKITEYFYLKFGEAQQLALAFSKE 230 

F+ P++ L+ N+++V+ L + +T+ L+F AQ+ KE 
Sbjct: 184 FLTKPPLVGDHGLVANYLLVSIVLVATISLTKKI-LEFFLAQEAEKMIVKE 233 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 70/216 (32%) , Positives = 117/216 (53%) , Gaps = 1/216 (0%) 

Query: 2 lEFPSNIRTEIPQMPLLIHSLILSVLPPLMWLTLVNRDKPLYKTIWSILLGLQLITIYTW 61 

++FF+ +P !•+■*■ L + L++LT ++ + L Q+I +YTW 

Sbjct: 1 MDFFAIDPIGLPHTSLIFYLSSLLIALLLVFLTFQAYRLKSHRYFFLFLQLSQVIGLYTW 60 

Query: 62 FFWAKLPLSESLPLYHCRIGMFWL-LARPGILKDYFALLGWGGVLAMIHPDFYPYQFL 120 

+ PL E+LPLYHCRI M + L K F +LG+ G LA++ PD YP++ 

Sbjct: 61 YVLRGFPLDEALPLYHCRIAMLAIFFLPDRNKFKQLFMVLGIGGTFLftLLSPDLYPFRLW 120 

Query: 121 HVTNIFFPIGHFALFVLSLLHLMTQSNLDKLNPKLIIQLTLLINMSLIFINLLTGGNYGF 180 

HV N+ F+ GH+AL V L++L+ + +L +++ +N L+ ++L T GNYGF 
Sbjct: 121 HVaWSFYFGHYALLTOGLIYLLRFYiaSQLRLLSVWYLATVNPLLLLVSLATKGNYGF 180 

Query: 181 MMKTPILGITNPFLNLFI\?TTLLSFLVLFVKQIFQK 216 

+M P++ + m IVT+ L+F+V + + K 

Sbjct: 181 VMDIPVIHTRHLLLNFVIVTSGLTFMVKITEYFYLK 21S 

Based on this analysis, it was predicted tiiat these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2206 

A DNA sequence (GBSx2325) was identified in S.agalactiae <SEQ ID 681 1> which encodes the amino 
acid sequence <SEQ ID 6812>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3297 (Affirmative) < suco 

bacterial ineihbrane Certaintyi=o . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2207 

A DNA sequence (GBSx2326) was identified in S.agalactiae <SEQ ID 6813> which encodes the amino 
acid sequence <SEQ ID 6814>. This protein is predicted to be oxaIate:fonnate antiporter (oxlT-2). Analysis 
of this protein sequonce reveals the following: 
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Possible site: 27 





have a cleavable N-te 


cm signal seq. 










INTEGRAL 


Likelihood = 


-7.80 




380 


396 


376 


399) 


IHTEGRAL 


Likelihood = 


-7.43 


Transmembrane 


291 


307 


284 


310) 


INTEGRAL 


Likelihood » 


-5.63 


Transmembrane 


169 


185 


163 


186) 


INTEGRAL 


Likelihood = 


-4.99 


Transmembrane 


226 


242 


223 


245) 


INTEGRAL 


Likelihood = 


-4.19 


Transmembrane 


46 


62 


39 


63) 


INTEGRAL 


Likelihood - 


-4.09 


Transmembrane 


311 


327 


308 


329) 


INTEGRAL 


Likelihood = 


-1.49 




261 


277 


260 


278) 


INTEGRAL 


Likelihood = 


-1.06 




133 


149 


133 


150) 


INTEGRAL 


Likelihood = 


-0.85 


Transmembrane 


98 


114 


98 


114) 


INTEGRAL 


Likelihood = 


-0.06 


Transmembrane 




93 


77 


93) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 4121 (Affirmative) . 
- Certaintys=0. 0000 (Not Clear) < i 
■- Certainty=0.00C0(Not Clear) < i 



The protein has homology willi the following sequences in the GENPEPT database. 

sGP:AAF35228 GB:AF168363 oxalate : formate antiporter [Lactococcus 
lactis] 

Identities = 220/398 (55%) , Positives = 30S/398 (76%) , Gaps = 3/398 (0%) 

NRYWAVSGWLHLMLGSTYAWSVFRNPIISETGWDISSVSFAFSLAIFCLGMSAAFMGH 64 
NRYWA +GV+ HLM+GS YAWSVF NPI + GW SSV+ AFS+AI+ LGMSAAFMG 
NRYWAFAGVMFHLMIGSVYAWSVFTNPIAKQNGWAESSVaLaPSIAIYFLGMSAAPMGK 63 





5 


Sbjct: 






65 


Sbjct: 


64 




125 


Sbjct: 


124 




185 


Sbjct: 




Query: 


243 


Sbjct: 


243 




303 


Sbjct: 


303 


Query: 


363 


Sbjct: 


363 



4-VE+ GPR+ G I++ LYG G ++TG AI 



+WLLy++TC++GG+GLG+GY+TFVSTI 



IKWFPD+R6LATG AIMGFGFA+++T P+AQ LM +G+ +TFY+L6 YP +M++A+QF 



SAASPMAQ + G S ++AA++VQ++G+FNGFGRL+WA+LSDYIGRP TF +FI++ +M 



++F IA+ +LM+CYGaGFS++PAYL D+FGTKEL 



6PLLLS T+ ++Y LTL F 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6815> wliich encodes the amino acid 
sequence <SEQ ID 6816>. Analysis of this protein sequence reveals the following: 

Possible site: 27 





have a cleavable N-te 


Tn signal seq. 
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Final Results 

bacterial membrane Certainty^O. 6180 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GF:AAF36228 GB:AF168363 oxalate : formate antiporter [Lactococcus 

lactis] 

Identities = 222/399 (55%) , Positives = 305/399 (75%) , Gaps = 3/399 (0%) 

Query: 3 KTKRYIIATAGILLHLMLGSTYAWSVYRNPILQETGWDQAPVAFAFSLAIFCLGLSAAFM 52 

KT RY++A AG4-+ HLM4GS YAWSV+ NPI ++ GW ++ VA AFS+AI+ LG+SAAFM 
Sbjct: 2 KTNRYWAFAGVMFHLMIGSVYAWSVFTNPIAKQiaGWAESSVALAFSIAIYFLGMSAAFM SI 

Query: 63 GNLVEQYGPRLTGTVSAILYASGNMLTGIAIDRKEIWLLYIGYGVIGGLGLGAGYITPIS 122 

G +VE+ GPRLTGT+++ LY +G ++TG AI + IWriLY+ YGVIGGLGLGAGY+TP+S 
Sbjct: 62 GKVVEKIGPRLTGTIASFLYGTGTIMTGWaiHQNSIMLLYLSYGVIGGLGriGAGYVTPVS 121 

Query: 123 TIIKWPPDKR6MATOPAIMGFGFASIjLTSPIAQW]:.IETEGLVATFYLLGLIVLIVIt[iFAS 182 

TIIKWPPDKR6+ATG AIMGFOFA++LT P+AQ L+ + GL TFYLLG Y ++ML A+ 
Sbjct; 122 TIIKSIFPDKRGllATGLAIMGFGFARMJTGPVAQQL^IaSVGLEQTFYLLGTFYFV■I^a■LaA 181 

Query: 183 QLIIKPTAAEIAILDKKRLQ-NNSyLIEG--MTAKEALKTKSFYCLWVILFINITCGLGL 239 

Q I++P A + + Q + L G +TA +ALKTKSF LW++ FINITCG+GIj 
Sbjct: 182 QFIWPNIALSSTTENSISQKKGTRLTRGPELTANQALKTKSFTFLWIMFFINITCGIGL 241 

Query: 240 ISWAPMAQDBTGMSPEMSAIWGAMGIENGFGRLVWASLSDYIGRRVTVILLFLVSIIM 299 

+S +PMAQ +TaMS + +AI+VG +G+FNGFGRL+WA+LSDYIGR T +F++ I+M 
Sbjct: 242 VSAASPMAQSMTGMSVQTAAIMVGIIGLFNGFGRLIWATLSDYIGRPATFSAIFIBDIVM 301 

Query: 300 TISIjIFAHSSi:iIFMISIATIiMTCYGaGFSLIPPYLSDLFGAKEliATI.HGYILTAWAIAAL 359 

+++ D+F+I++ LM+CYGAGFS+IP YL D+FG KEL +HGY+LTAWA A + 

Sbjct: 302 LSAlLIFKLPU^FVIALa»IJ4SCYGRGFSVIPAYIjSDVFGTKELGAWGY\n^TAWAARGV 361 

Query: 360 TGPMbLSITVEWTHim.LTLCVFIVLYILGIiMVA]:iKLKK 398 

GP+LLS+T + HNY LTL FI++ +L L+++ +++ 
Sbjct: 362 VGPLLLSLTHQLEHNYTLTLAAFILIDLLALLISFWIQR 400 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 252/400 (63%) , Positives = 329/400 (82%) , Gaps = 2/400 (0%) 

Query: 1 MKm^YVVAVSGVVLHIMLGSTYAWSVFRNPIISETGWDISSVSFAFSLAIFCM^ 60 

M+ RY++A 4G++IiHIjMIiGSTYAWSV+RNPI+ ETGWD + V+FAFSLAIFCLG+SAA 
Sbjct: 1 MEKTKRYIIATAGILIj™i^TyAWSVYlOTII^TGWDQ&PVRFAFSIAlFCIX3LSAA 60 

Query: 61 FMGHLVERFGPRIMGMISAILYGAGMVLTGLAIETCXiLWIiLYVAYGiriaGIGLGSGYITP 120 

FMG+LVE++GPR+ G +SAILY +GN+LTGIAI+ +++VO:j.Y+ YG++GG+GLG+GYITP 
Sbjct: 61 FMGNL^?EQYGPRLTGTVSAILYASGNMI.TGLAIDRKEIWLLYI<3YGVIaaI/3IiGAGYITP 120 

Query: 121 VSTIIKWFPDRRGLATGFAIMGFGFASLVTSPLAQSLMIRlGVGKTFYILGLWFFi/I'MI 180 

+STIIKWFPD+RG+ATGFAIMGFGFASL^TSP+AQ L+ G- TFY+LGL+Y 
Sbjct: 121 ISTIIKWFPDKRGMATGFAIMGFGFASLLTSPIAQWLIETEGLVATFYLLGLIYLIVMLF 180 

Query: 181 ASQFIKQPPQEKITILTHDGKKNAMNSQIITGLKANAAIKSKTFYIIWLTLFINISCGLG 240 

ASQ I +P +1 XL D K+ NS +1 G+ A A+K+K+FY +W+ LFINI+CGLG 
Sbjct: 181 ASQLIIKPTAAEIAIL--DKKRLQNNSYLIEGMTAKEALKTKSFyCLWVILFINITCGLG 238 

Query: 241 LISAASPMAQDLAQYSAESAAUiVGVLGIFNaPGRLLWASLSDYIGRPLTFIILFIVNFI 300 

LIS +PMAQDL G S E +A++VG +GIFNGPGRL+MASLSDYIGR +T I+BF+V+ I 
Sbjct: 239 LISVVAPmQDLTGMSPEMSAIVVGaMGIFNGPGRLVWaSLSDYIGRRVTVILLFLVSII 298 

Query: 301 MTSSLFLSFNAIVFAIAMSILMTCYGAGFSLLPAYLSDIFGTKELATLHGYSLTAWAIAG 360 

MT SL + ++++F LMTCYGASFSL+P YLSD+FG KEIATLHGY LTA5JAIA 

Sbjct: 299 MTISLIEAHSSLIFMISIATLMTCYGAGPSLIPPYLSDLFGAKEIATLHGYILTAWAIAA 358 
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Query: 361 LFGPLLLSKTYSWGNSYQLTLMVFGFLFLFGLLLSLYLRK 400 

L GP+LLS T W ++Y LTL VF L++ GIj+++L L+K 
Sbjct: 359 LTGPMLLSITVEWTHNYLLTLCVFIVLYILGLMWaLRLKK 398 

A related GBS gene <SEQ ID 8995> and protein <SEQ ID 8996> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrirti Score: 5.06 
C5vH: Signal Score (-7.5): 4.38 
Possible site: 27 



»> Seems to have a cleavable N-tsrm signal seq. 
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*** Reasoning Step:' 3 

Pinal Results 

bacterial membrane Certainty=0 .4121 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < succs 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02272(313 - 1500 of 1818) 

QP|7107009|gb|AAP36228.llAF158363_4|AF168363(4 - 400 of 421) oxalate : formate antiporter 
{Lactoeoccus lactis} 
%Match =38.5 

%Identity =55.4 %Similarity =79.1 

Matches =220 Mismatches =81 Conservative Sub.s = 94 

216 246 276 306 336 366 396 426 

GK*IC*AE™r*YlQFB™LFIT[reiFKNKT*TOF*KDCLKmiNRYWAVSGWLtILMLGSTYAWSVFRNPIIS 

mill =lh:|IMI null III : II I 
MKTNRYWAFAGVMFHLMIGSVYAWSVFTNPIAKQNGWAES 
10 20 30 40 

456 486 516 546 575 606 636 666 

SVSFAFSLAIFCLGMSAAFMGHLVERFGPRIMGMISAILYGAGNVLTGLAIETQQLWLLYVAYGILGGIGLGSGYITPVS 
ri: = lil:|h lllllllll =11= 111= I l = = III I = = !! II =llll = = ll = = ll = llhll = llM 

SVALAFSIAIYFLGMSAAJmGKVVEKIGPRLTGTIASFLYGTGTIMTGWAIHQNSIWLLYLSYGVIGGLGLGAGYVTPVS 
60 70 80 90 100 110 120 

696 726 756 786 '816 846 876 906 

TIIKWFEDRRGIATGFAIMGFGFASLVTSPLAQSLMIEIGVGKTFYILGLVYFFVMMIASQFIKQPPQEKITILTHDGKK 
llllllll = llllll = llllllll = = = l 1 = 1! II =1= =IIMI II =l = = l = lll =1 == I = 
TIIKWFPDKRGIATGLAIMGFGFAAMLT6PVAQQIMASVGLEQTFXLLGTFYFVIMLLAAQFITOP-NIJa;SSTTENSIS 
140 150 160 170 180 190 200 

936 960 990 1020 1050 1080 1110 1140 

NAMNSQI ITG- - LKANAAIKSKTFYI IWLTLFINISCGLGLISAASPMAQDLAGYSAESAALLVGVLGIENGFGRLLWAS 
= = = I I II |:hl:| =1= :|||I = IMI = IIIIIIII = I I = = I I = = II = = I = II I II I I = I h 
QKKGTRLTRGPELTANQALKTKSFTFLWIMFFINITCQIGLVSAASPMaQSMTQMSVQTAAXMVGIIQLENGFGRIiIWAT 
210 220 230 240 250 260 270 280 



1170 1200 1230 1260 1290 1320 1350 1380 
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LSDYICSRPLTFIILFIVNFIMTSSLFLSENAIVFAIAMSILKTCyGAGFSLLPAYLSDIFGTKELATLHGYSLTAWAIflG 
nil nil II =|| = = :| 1:::: = : I 11 = =1 I n 1 II i I 1 = •• II I 1 h I I I I I I = II I II II Mil 

LSDYIGRPATFSAIFILDI^LSAILIFKLPLLFVIALOJLMSCTGAGFSVIPAYLGDWGTKELGaVHGYVLTJ^^ 
290 300 310 320 330 340 350 360 

5 

1410 1440 1470 1500 1530 1560 1590 1620 

LBGPLIiSKTySWGNSYQLTLMVFGFI^FGLLLSLYLRKLTTKW* 

: llllll |: ==l III 1 :: 1= l|:|=:==: 

WGPLLLSLTHQLFHNYTLTIAAFILIDLLMiLISFWIQRDFIKASKLIKKQIIKNYFI^ 
10 370 380 390 400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2208 

15 A DNA sequence (GBSx2327) was identified in S.agcdactiae <SEQ ID 6817> which encodes the amino 
acid sequence <SEQ ID 681 8>. This protein is predicted to be D-Ala-D-Ala adding en2yme (murF). 
Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

20 

Final Results 

bacterial cytoplasm — Certainty^O. 1311 (Affirmative) < suco 

bacterial irembrane CertaintyaO. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 

25 

A related GBS nucleic acid sequence <SEQ ID 9739> which encodes amino acid sequence <SEQ ID 9740> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95436 QB:AF0e89Cl D-Ala-D-Ala adding enzyme [Streptococcus pneumoniae] 
30 Identities = 313/453 (69%) , Positives = 375/453 (82%) 



Stojct: 1 MKLTIHEIAQVVGAKNDISIFEIHQriEKAEFDSRLIGTGDLEVPLKGARDGHDFIETAFE 50 

Query: 92 NGRIATISEKEIEGHPYIiVSnALraiPQVLAQVYIEKMITOVIAVTGSira 151 

M3A T+SEKE+ HPY+LV D L APQ LA YY+EK VDV AVTGSNGKTTTKDM+A 
Sbjct: 61 NGaaVTLSEKEVSNHPyiLVDDVLTAFQSIASYYLEKTTVDVFAVTGSNGKTTTro^ 120 

Query: 152 ILSTTXKTm-QGNYNNEIGLPYTVIjHMPBDTEKIILEMGQl^^ 211 

+LST YKTYKTQQISreNNEIGLPYTVLHMPE TEK+4-LEMGQnHLGDIH+LSE+A+P+ A+ 
fibjct: 121 LISTRYKTYKTQSNYNNEIGLPYTVl^IPEGTEKLVLEMGQDHMDIHLLSEI^P 180 

Query: 212 VTLIGEAHLEFFGSREKIAEGKMQITDGMSSDGILIAPGDPIIDPYLPANQMTIRPGHDQ 271 

VTL+GEAHL FF R +IA+GKMQI DGM+S +L+AP DPI++ YLP ++ +RFG 
Sbjct: 181 VTLVGBAHLAFFKDRSEIAKBKMQIAlXmSGSIilAPADPIVEDYLPTDKKVVRFGQGA 240 

Query: 272 ELQVTELKEEKHSLTFKTNALEHQLRIPVPGKYtlATNAMVaAYVGKLLAVAEEDIVDALE 331 

EL++T+I. E K SLTFK N LE L +PV GKYMATNAM+A+YV V+EE I A + 

Sbjct: 241 ELEITDI.VERKDSLTFKANFl^QVIiDLPVTGKYiaTNAMIASYVAE.QEGVSEEQIHQAFQ 300 

Query: 332 NLQLTRNRTEWKKSANGADILSDVYNATOTAMRLILETFSAIEIMjGGKKiaLIiADMKEL 391 

+Iri-LTRNRTEWKK+ANGaDILSDVXNaNPTaM+LII^;TFSaiP N+GGKKIA+LADMKEI, 
Sbjct: 301 DLELTRNRTEWKKAANC^MILSDVYNANPTAMKLILETFSAIPAHEGGKKIAVIJmM 360 

Query: 392 GEQSVDLHNQMIMSIRPDSIDTLICYGQDIEGLAQLASQMFPIGKVYFFKKNQEVDQFDQ 451 

G QSV LHIIQMI+S+ PD +DT+I YG+DI LAQLASQMFPIG VY+FKK ++ DQF+ 
Sbjct: 351 Gl^JQSVQLHNQMILSLSPDVLDTVIFYGEDIAElAQLaSQ^WPIGHVYYPKICTEDQDQFED 420 



wo 02/34771 



PCT/GBOl/04789 



A related DNA sequence was identified in S.pyogenes <SEQ ID 68]9> which encodes the amino acid 
sequence <SEQ ID 6820>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3299 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 323/452 (71%) , Positives = 387/452 (85%) 

Query: 32 MKLSLHlWRKVVGAKN(3VSEPEDVPLGNIEFDSRNISEGDLPLPLKEMmGHEFIEMJi^ 91 

MKL+LHEWRK+V A+N VS+ -l-DVPL +IEFDSR I++GDLPLP):.KB RDGHEPI++AP 
Sbjct: 1 MKLTLHE\mKIVnaQNNVSDLDDVPLHHIEFDSRKITKGDLFLPLKGC3RIX3HEFIDIiAFQ 60 

Query: 92 NG&IATISEKEIEGHPYLLVSDJUJCaPQVIAQYYIEKMNVDVmVTGSNGKTTTKDMIAA. 151 

NGA+AT SEKE+ G P+LLV D LKAFQ lA YYI+KM VDVIAVTGSNGKT+TKDMI A 
Sbjct: 61 NGAVATFSEKELPGKPHLLVEDCLKAFQKLAHYYIDKMRTOVIAVTGSNGKTSTKDMIGA 120 

Query: 152 ILSTTYKTYKTQGtrmiEIGLPYTVLHMPEDTaKIILEMGQDHLGDIIIVLSEIAKPRIAV 211 

+LSTTYKTYKTQGNYNNEIGLPYTVLKMP+DTEKI+LEMGQDH+GDI +LSEIA+PRIAV 
Sbjct: 121 VLSTTyKTYICrQGNYNNEIGLPYTVLHMPDDTEKIVI.EMGQDHMGDIRLLSKIARPRIAV 180 

Query; 212 VTLIGEAHIjBFFGSREKIAEGKMQITDGMSSDGILIAPGDPIIDPYLEaNQMTIRFGHDQ 271 

+TL+GEAHIiE+FGSR+KIA+GKMQI DGM+SDGILIAPGDPIIDPYIjP NQM IRFG+ Q 
Sbjct: 181 LTLVGEAHLEYFGSRDKIAQGKMQIVDGMNSDGILIAPGDPIIDPYLPENQMVIRFtaiQQ 240 

Query: 272 EIlQVTEIlKEEKHSI.TFKImLEHQI^IPVPGKXNRT^fflMV;^^ 331 

E+ VT ++E+K SLTF TH Ii + +P+PGKYMATNaMVAAYVGKI,LAV +EDI+ AL+ 
Sbjct: 241 EIDVTGIQEDKDSLTFTTNVIATPVSLPLPGKraATNAMVaAyVGKLIAVTDEDIIA^ 300 

Query: 332 HLQLTRlQRTEWKKHMSQaDILSDVYH^ 391 

+ LT NRTEMKK+flNQftDILS0VYNai!TPTAMRLILETP+ I N GGKKIA+LADMKEL 
Sbjct: 301 TVTLTGNHTEWKKAANOMILSDVYNANPTAMRLILETFANIAKNPGGKKIAV]^ 360 

Query: 392 GEQSVDLHNQMIMSIRPDSIDTLICYGQDIEGLAQI^QMFPIGKVYFEKKNQEVDQPDQ 451 

G+ SV LH+Qfl S+ +ID L+ YG 1+ LA+LASQ++P +V++F K ++ DQF+ 
Sbjct: 361 GKDSVILHSQLIDSLTSGNIDQLVFYGDHIKEIJUa^QVYPAEQVHYFLKTEQEDQFEA 420 

Query: 452 LLAKVKDTLKEKDQILLKGSNSWSILSKIVDIL 483 

+ V++ L DQILLKSS+SM+L K+VD L 
Sbjct: 421 MAQYVQNiraPFDQILLKBSHSMSLEKLVDRL 452 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 

Example 2209 

A DNA sequence (GBSx2328) was identified in S.agalactiae <SEQ ID 6821> which encodes the amino 
acid sequence <SEQ ID 6822>. Analysis of this protein sequence reveals the foUowmg: 

N-termlnal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1381 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certaiiity=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaC95435 GB:AP068901 D-Ala-D-Ala ligase [Streptococcus pneumoniae] 
Identities = 243/346 (70%) , Positives = 289/346 (83%) 

Query: 3 KETLILLYCKRSAEREVSVLSftESVtOUilHYDKETVKTyFlTQVGQFIKTQEFDEMPSSD 62 

K+T+ILLYGGRSAEREVSVLSAESVMEA+HYD+P VKT+FI+Q G FIKTQEF P + 
Sbjct: 2 KQTlILLYGGRSAEREVSVLSaESAmEAVNYDRPTVKTFFlSQSGDFIKTQEFSHAPGQE 61 

Query: 63 EKLMTNQTVDLDK^WRPSDIYDDNAIVFPVLHGPMGE!DGSIQGF]:JEVIlRMFYVGTNILSS 122 

++LMTN+T+D DK V PS IY++ A+VFPVIHGPMGEDGS+QGFLEVL+MPYVG NILSS 
Sbjct: 62 DRLMTNETIDWDKKVAPSAIYEEGAWFPVLHGPMGEDGSVQGFLEVLKMPYVGCNILSS 121 

Query: 123 SVAMDKITTKQVLATVGVPQVAYQTYFEGDDLEHAIKLSLETLSFPIFVKPAMMGSSVGI 182 

S+AMDKITTK+VL + G+ QV Y EGDD-I- I E L++P+F KP+NMGSSVGI 
Sbjct: 122 SLAM)KlTTKRVLESAGIAQWYVAIVEGDDVTAKIAEVEEKIi&YPVFTKPSNMGSSVGI 181 

Query: 183 SKATDESSLRSftlDIALKYDSRILIEQGVTAREIEVSIIfiNNDVKTr^ 242 

SK+ ++ LR A+ LA +YDSR+L+EQGV AEEIEVG+LGN DVK+T PGEWKDV FYD 
Sbjct: 182 SKSENQEELRQALKLAFRYDSRVDVBQGVNAREIEVGIiKSNYDVKSTLPGEV^ 241 

Query: 243 YDaKYIDNKITMDIPAKVDEATMEfimQYASKaFKAIGACGLSRCDP5T.TKIX3QIFLNEL 302 

YnaKyiDNKITMDIPAK+ + + MRQ A AP+AIG GLSRODFF T G+IPIjHEL 
Sbjct: 242 YDAKYIDNKITMDIPAKISDDWAVMRQNAETAPRAIGGLGLSRCDFFYTDKGEIFLNEL 301 

Query: 303 NTMPGFTQWSMYPLLWENMGLTYSDLIEKLVMLAXEMFEKRESHLI 348 

NTMPGFTQWSMYPLLW+imG++Y +LIE+LV LAKE F+KRE+HLI 
Sbjct: 302 NTMPGFTQWSMYPLLWDHMGISYPELIERLVDLlUCESFDKREftHLI 347 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4559> which encodes the amino acid 
sequence <SEQ ID 4560>. Analysis of this protein sequence reveals the following: 

3 H- terminal signal sequence 



Final Results 

bacterial cytoplasm -— Certainty=0 . 1451 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 261/348 (75%) , Positives = 306/348 (87%) 

MSKETLILLYGGRSAEREVSVLSAESVMRAINYDKFPVKTYFITQVGQFIKTQEFDEMPS 6 0 
MSK+TL+LLYGGRSREREVSVLSAESVMRA+NYDKF VKrYFITQ+GQFIKTQ+F E PS 
MSKQTLVLLYGGRSAEREVSVLSaESVMRA\mDKFLVKTYFITQMGQFIKTQQFSEKPS 6 0 

SDEKUmtQTVDLDKMVRPSDIYDDNAIVFFVLHOPMaEDGSIQGFLEVLRMPYVGTNIL 12 0 
E+imN+T++L + ++PSDIYH-+ A-HVFPVLHGPMGEDGSIQGFLEVLRMPY+GTN++ 



SSS+AMDKITTK+VL ++G+PQVAY Y +G DLE + +L L+FPIFVKPANMGSSV 



GISKA + LR AI LAL YDSR+LIEQGV AREIEVG+LeN+ VK+T PGBV+KDVDF 



YDY AKY+DNKITM IPA VD++ + MR YA AFKA+G CGLSRCDFFLT+DGQ++LN 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 



Query: 301 ELNTMPGFTQWSMYPLLWENMGLTYSDLIEKLVMLAKEMFEKRESHLI 348 
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EENTMPGFTQWSMYPLIiWEINMGL Y DLIE+LV I1R+EMF++RESHLI 
Sbjct: 301 ELNTMPGFTQWSMYPLLWENMGIAYPDLIEELVTLAQIMFDQRESHI.I 348 

Based on tiiis analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2210 

A DNA sequence (GBSx2329) was identified in S.agalactiae <SEQ ID 6823> which encodes the amino 
acid sequence <SEQ ID 6824>. This protein is predicted to be recombination protein (recR). Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2540 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certaxnty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MLYPTPIJUOiIDSFSKLPGIGTKTATRIAFYriGMSDEDVHBFAKNIJJUUaiELTYCSVC 60 
MLYPTPIAKLIDSFSKLPGIG KTATRLAFYTI MSDEDVN+FAKNIIAAKRELTYCSVC 

SbjcU: 1 MLYPTPIAKLIDSFSKLPGIGAKTATRLAFYTISKSDEDVNDPAKNXiLAAKEELTYCSVC 60 

Query: 61 GNLTDDDPCLICTDKTRDQSVILWEDSKDVSaMEKIQEYNGLYHVLHGLISPMKGISPD 120 

G LTDDDPC+ICTD+TRD++ ILWEDSKDVSAMEKIQEY GLYHVL GLISPMNG+ PD 
Sbjct: 61 GRLTDDDPCIICTDETRDRTKILWEDSKDVSftMEKIQEYRGLYHVLQGLISPMNGVGPD 120 

Query: 121 DiraiKSI.ITRimGQVTEVIVATmTAIX3EATSMYISRVLKPAfiIKVTR^ 180 

DINIiKSLITRLMD +V EVI+AT^Ia.TJffiGEATStreISRVLKPAGIKUTRLRRGLAVBSDI 
Sbjct: 121 DI^^^KSLITRIl^mSEVDEVIIAT^IATJUDGEaTSMYISRVLKPiM3IKOTRIiaR 180 

Query: 181 EYADSVTIitiRAIENRTEL 198 

EYMEVTLLRAIENRTEL 
Sbjct: 181 EYADEVTLLRAIENRTEL 198 

A related DNA sequence was identified in S.pyogems <SEQ ID 6825> which encodes the amino acid 
sequence <SEQ ID 6826>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0. 2652 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < succs 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 180/198 (90%) , Positives = 1S2/198 (95%) 

Query: 1 MLYPTPIAKLIDSFSKLPGIGTKTATRLAFYTIGMSDEDVNEFAKNLLAAKRELTYCSVC 60 

+LYPTPIAKLIDS+SKLPGIG KTATRIAFYTIGMS+EDVm-FAKNLLAAKRELTYCS+C 
Sbjct: 1 VLYPTPIAKLIDSYSKIiPGIGIKTATRriaFYTIGMSNEDVlJDFAKNLLaAKREr.TYCSlC 60 

Query: 61 GNLTDDDPCLICTDKTRDQSVIIiVVEDSKDVSAMEKIQEYlJBLYHVLHGLISPMNGISPD 120 

GNLTDDDPC ICTD +RDQ+ ILWED+KDVSAMEKIQEY+G YHVLHGLISPMNG+ PD 
Sbjct: 61 GNLTDDDPCHICTDTSRDQTTILWEnAKDVSAMEKIQEYHaYYHVLHGLISPMNGVGPD 120 
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Query: 121 DINUKSLITRLMDGQVTEVIVATNATADGERTSMyiSRVLKPAGIKVTEIARGIljAVGSDI 180 

Dirn^KSLITRLMDG+V+EVIVATNATADGEiaTSMYISRVLKPaGlKVTRLARGLAVGSDI 
Sbjct: 121 DIM.KSLITRLMDGKVSEVIV^TmTADGE)ATS^O■ISRVLKPftGIKVTRIlaRGIiaVGSDI 180 

5 Query: 181 EYADEVTLLRAIENRTEL 198 

EYADEVTLLHAIENRTEL 
Sbjct: 181 EYADEVTLLRAIENRTEL 198 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 2211 

A DNA sequence (GBSx2330) was identified in S.agcdactiae <SEQ ID 6827> which encodes the amino 
acid sequence <SEQ ID 6828>. Analysis of this protein sequence reveals the following: 

Possible site; 23 
15 »> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytqplasm --- Certainty=0 . 3144 (Affirmative) < suco 

bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 

20 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 2212 

A DNA sequence (GBSx2331) was identified in S.agalactiae <SEQ ID 6829> which encodes the amino 
acid sequence <SEQ ID 6830>. This protein is predicted to be penicillin-binding protein 2b. Analysis of this 
protein sequence reveals the following: 

30 Possible site: 52 

>» Seems to have no N- terminal signal sequence 

nWEGRAL Likelihood =-13.69 Transmembrane 23 - 39 ( 17 - 46) 

Final Results 

35 bacterial membrane — Certainty=0. 6477 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:AAC44614 GB:US8210 penicillin-binding protein 2b [Streptococcus thermophilus] 

Identities = 341/683 (49%) , Positives = 477/683 (68%) , Gaps = 12/683 (1%) 

Query: 4 RKKRYRLTVKKQNASIPRRLNLLFFIIVLLFTVLILRLEQMQIGQQSFYMKKLTALTSYT 53. 

++K R ++ +1 RR+ LLP ++ +LF +L RL MQ+ +SFY KKL + YT 
45 Sbjct: 18 KRKEKRANKPRKPVNISRRVYLLFGVVFVLFLLLFARLTYMQVYHKSFYTKKLEDNSKYT 77 

Query: 64 VKESKARGQIFDZVKGVVLVEiroERPTVaFSIUSKnJISSQSIKELANKLSHYim 123 

V+ + R(3QIPDftKG+ L H + + F+R N +SS ++K +A +L+ +TLTE +D 
Sbjct: 78 VRIASERGQIFDAKGIALTnSQSKDVITFTRSNLVSSDTrOCSVaERLATLVTLTETKVTD 137 

50 

Query: 124 RAKRDYYIJfflKaNYKKVVESLPDSKRyDKFGNHL2ffiSTVYaNAV2UWPVSaiNYSEDELK 183 

R KR++YLAD ANYK+W LP+ K+ DKFGN LAE+T+Y NA+ AVP A+H-YSEDELK 
Sbjct: 138 RQKREFYLADSANYKRVVNDLPNDKKmKBGNKLAEATrYNimimVPDEAVDYSEDE 197 
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Query: 184 VVALFWQMmTPTFGSVKLSTSELSDDQIKIOjDADKKELLGISVTSNWHm^ 243 

+V +++ MNA F +V L T +L+ DQI + A +KEL GI V +W R +SLS + 
Sbjct: 198 IWIYSHMNAVSIWSWILKTADLTPDQIAIVAiUCQKEmGIRVAipWERHTSDSSLSPL 257 

Query: 244 LGTISTEKAGLPREEVKKYLKKGYSLNDRVGTSYLEKQYEDDLQGIRQIRKWVNKKGKV 303 

+G +S+ +AGLP+E+ K YLKKGY+LNDRVGTSYLEK+YE+H-LQG +R++ V+K+GKV 
Sbjct: 258 IGRVSSSEAGLPQEDAKDYIiKKGYALiroRVGTSYIiEKEYEEELQGKHTVREITVDKEGK^ 317 

Query: 304 VSDNITQEGKBGEl&KIiTIDUSIYQNKVESILKQYYGSELSSGRASFSEGMYAVAIEPSTG 363 

SD I Q+G G NLKLTIDL++Q VE IL Q SE+S +A++SEGMYAV + TG 
Sbjct: 318 DSDKIIQKBSKGMSn^KLTIDLDFQKBVEDILGQQIiSSEISGNKATYSEGMYAVVMK^^ 377 

Query: 364 KVIAMaGLKNDHG--miVDDSLGTIAKNFTPGSVVKGATL^ 421 

VIAMAG K++ G + D+LGTI FTPGSVVKGATL++GW + + G++VL DQ I 
Sbjct: 378 AVLamGQKHEQGAQDFKADALGTITDVFTPGSVVKGATLTAGWRSGAIYGDQVLTDQPI 437 

Query: 422 ANIRSWFT-RGLTPISAAQALEYSSNTYMVQVALRLMGQDYNTGDALTDRGYQEA 475 

I SWFT +G I+A QALEYSSWTYMVQ+A++ +GQ Y G +L+ ++A 
Sbjct: 438 NIASSPPITSWFTDKGSRAITATQALEYSSNTYMVQIAIKRLGQQYVPGMSLSTDNMEKA 497 

Query: 476 MAKLRKTYGEYGKSVSTGLDLP-ESEGYVPGKYSICTTIJIESFGQYnAYTPMQLGQyiST 534 

M LR TY E+G+GVSTGIjDLP ESEGY+P Y++ Ii E+FGQYD+YT +QL QY+++ 
Sbjct: 498 OTTLRDTYAEPGMGVSTGLDLPGESEGYIPKNYNVANVLTEAPGQYDSYTTIQ^ 557 

Query: 535 IAMNGITOAPHWSDIYEGNDSNKFAQIjVRSITPKTIOTCIAIS 594 

IAN G R+APH+V IY+ + L ++ + ENK+++ +-(-L IIQ+GF++VVNS 

Sbjct: 558 IANGGKRVAPHrVGGIYDAGKNGSIX3TLSSTVDTRVIIIKLSI£)SKQtBIIQQGPHDVVK^ 617 

Query: 595 GSGYATGTSMRG^m■TISGKTGTAETFAKlmIGQTOSTY]!a:JSIAIAYDT^ 651 

GS ATG +M ++ ISGKTGTAET+A H- +G +V+T NLNA+AY T + K+AV +M 
Sbjct: 618 GSSLATGKAMASSIIPISGKTCTAETYATIXSSGNSVTTVOTmVAYATAKDGTKIiAVaiM 677 



Query: 652 YPHVTTDTTKSHQLVARDMIDCJY 674 
YPH +K+HQ + ++4- Y 

35 Sbjct: 678 YPHALDWKSKAHOMAVKAIMErjY 700 



A related GBS gene <SEQ ID 8997> and protein <SEQ ID 8998> were also identified. Analysis of fliis 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 8 
40 McG: Discrim Score: -12.38 

GvH: Signal Score (-7.5): -5.9 
Possible site: 35 

»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -12.42 threshold: 0.0 
45 INTEGRAL Likelihood =-12.42 Transmembrane 23 - 39 ( 18 - 46) 

PERIPHERAL Likelihood =4.56 355 
modified ALOM score: 2.98 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5967 (Affirmative) < succ; 

bacterial outside — Certainty-0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=C. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the 

50.5/71.3% over 683aa 



streptococcus 



GP 1 1685112 j penicillin-binding protein 2b Insert characterized 



ORF02276(307 - 2322 Of 2643) 

GP|l685112|gb|AAC44614.l| |US8210(a7 - 700 of 704) penicillin-binding protein 2b 
{streptococcus thermophilus} 
%Match =38.5 
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%Identity = 50.4 %Sindlarity = 71.2 

Matches = 342 Mismatches = 189 Conservative Sub.s = 141 

108 138 168 198 228 258 288 318 

5 liIHGR*NS*LPTTCFRI * *KIKPCFRIIjIiR* II*SIiYKKFRPSWLEFFI i™iLSVCKKEFL*™SSQSFYSKEI)MLNRKK 

:: : :=| 
MTSFWEKNSQKWKKWRQKRK 
10 20 

10 348 378 408 438 468 498 528 558 

RYRLTVKKQNASIPRRl^LFFIIXLLFTVLIIJUiEQMQIGQQSFYMKKLTALTSyrVKESKaRGQIFDAKG^ 

I := :1 l|: 111 == HI =1 II 11= III = 111= = 111111111= 1 I 

EKRANKPRKPWISRRVYLLFGWFVLFLLLEARLTVMQVYNKSFYTKKLEDNSKyiWIASE^^ 

30 40 50 60 70 80 90 100 

15 

588 618 648 678 708 738 768 798 

RPWAFSRGIOTISSQSIKELANKtSHYXTLTEVASSDRAKRDYYLaDKANYKKVVESLPDSK^ 

= = 1=1 I =11 ==l =1 =1= =1111 =11 ll==llll 1111=11 11= 1= Hill 111=1=1 II 
KDVITFTRSmjVSSDTMKSVaERLATLVTLTETKVTDRQKEEFYLADSaNYKRVV^ 
20 110 120 130 140 150 160 170 180 

828 858 888 918 948 978 1008 1038 

VAAVPVSAINYSEDELKVVaLENQMiaTPTFGSVKLSTGEIJSlX)QIKKLtlM 

= III l=:|||||ll=l ====111 . I =11 I =1= III = I =111 II I =11 =111 :=l 

25 imVEDEaVDySBDSLKIWIYSHMKaVSNFSTVII.KTADLTPDQIAIVAaKQKEr^ 

190 200 210 220 230 240 250 260 

10S8 1098 1128 1158 1188 1218 1248 1278 

ISTEKAGLPREEVTCKYLKKGYSIMDRVGTSYLEKQYEDDLQGIRQIRIWVVNKKGKWSDNITQEGKSGRim 
30 :|: :|||| = l= I I I I I I I = 1 II I I I I II I I I = I I = = I II =1 = = 1 = 1 = 111 II I 1 = 1 I 11111111 = = 

VSSSEaGLPQEnaKDyLKKGYAIM)RVGTSYI^KEYEEELQGKHTVREITVDKEGKVDSDK^ 

270 280 290 300 310 320 330 340 

1308 133S 1368 1398 1428 1452 1482 1512 

35 QNKVESILKQYYGSELSSGRASFSEGMYAVAIEPSTGKVLAMAGLKNDHG- -NLVDDSLGTIAKNFTPGSWKGRTLSSG 

1 II II I 11 = 1 =|::llllill = II mill |: = = l == 1 = 1111 I I I I I 1 I I I I I I = = I 
QKGVEDILGQQLSSEISGNKRTYSEGMYAVVMNADTGAVIAMaGQKHEQGAQD 

350 360 370 380 390 400 410 420 

40 1542 1566 1587 1614 1644 1574 1704 1734 

WEISIKVLRGimVLYDQ--EIAN---IRSWFT-RGLTPISAAQALEYSS]mMVQVALR™ 

I : : |: = ll II 11= I INI =1 hi 1111111111111 = 1:= =11 I I =1= = = ll 
WRSGAIYGDQVLTDQPINIASSPPITSWFTDKHSRAITATQfiLEYSSin™VQIAlKRICQQYVPGMSLSTDNMEK^^ 

430 440 450 450 470 480 490 500 

45 

1764 1821 1851 1881 1911 1941 1971 

LRKTYGEYGLGVSTGLDLP-ESEGYVPGKYSLGTTLMESFGQYDAYTPMQLGQYISTIAMNGNRLAPHWSDIYEGNDSN 

II II 1 = 1 = 111111111 11111 = 1 h= 1 1 = 11111 = 11 =11 ll = = = lll I 1 = 111 = 1 11= = 
LRDTYAEFGMGVSTGLDLPGESEGYI PKNYNVANVLTEAFGQYDSYTTIQIjAQYVAS lAHGGKRVAPHI VGGIYDAGKNG 

50 510 520 530 540 550 560 570 580 

2001 2031 2061 2091 2121 2151 2181 2211 

KFACJLTOSITPKTUreiAISDQELAIIQEGFYNVVNSGSGYATGTSMRGNVTTISGKTGTAETFAK^ 
: I := = ll|: = = = = l I I I = 1 I = = I I 1 I I I III =1 == IIIIIIIIIM = =1 =1 = 1 llll 
55 SLGTLSSTVDTRVIiNKLSUDSKQLGIIQCXSFHDVVHSGSSLATGK^^ 

590 600 610 620 630 640 650 660 

2262. 2292 2322 2352 2382 2412 2442 

lAYDTHR- - -KIAVRVMYPHVTroTTKSHQLVARDMIDQYISQFTGQ*ERTFECFTQHQi™*IffAFGtmiV*VLK^^ 
60 :|| 1 : |:|| =1||| =|:|| : =:= | : 

VRYATAKDGTKIAVGIMYPHALDWKSKAHQNAVK&IMBLYGNTH 
670 680 690 700 



SEQ ID 8998 (GBS292) was expressed in E.coli as a GST-fiasion product. SDS-PAGE analysis of total cell 
65 extract is shown in Figure 68 (Ime 9; MW 1 03kDa). 



GBS292-GST was purified as shoAvn in Figure 211, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
vaccines or diagnostics. 

Example 2213 

A DNA sequence (GBSx2332) was identified in S.agalactiae <SEQ ID 6831 > which encodes the amino 
5 acid sequence <SEQ ID 6832>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm — Certaintyi"0.2S44 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology vnth flie following sequences in the GENPEPT database. 



Query: 1 MVKLVFAEHGESEWNKaNLETCWADVDLSEKSTQQAIDftSKLIQAMIEFDr^^ 60 

MVKLVFJUlHGESEVmRNLFTGVaDVDLSEK!3TQQAIDAGKLI+ AGI+FD A+TSVLKR 
Sbjct: 1 MVKLVFARHGESEWNKAmFTGWADVDLSEKGTQQAIDAGKrJKEaGIKFDQAYTSV^ 60 

(Juery: 61 AIKTTNLaLEAADQLWVPVEKSWRUffiRHYGGLTGKNKASAAEQFGDEQraiWRRSYDVL 120 

AIKTTNLALEA+DQLWVPVEKSWRLIffil^dYGGLTGKNKS^^ 
Sbjct: 61 AIKrnSniALEaiSDQtiWVPVBKSWRLNEIffiYGGLTGKNKAEAAEQFGDEQTOIM^RSYDV^ 120 

Query: 121 PPDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVG 180 

PP+M +DDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDSKNVFVG 
Sbjct: 121 PPmmDDEHSaHTDRRYaSI£)DSVIPDAEOTJCVTLERALPFWEDKIAPALKDGK^^ 180 

Query: 181 AHGNSIRALVla^IKQLSDDEI^^WEIENFPPLVFEFDEm^LVSEy!^:JGK 230 

AHGNSIRALVKHIK LSDDEIMDVEIPNFPPLVFBFDEKLN+VSEYYLGK 
Sbjct: 181 AHGNSIRALVKHIKGLSDDEIMDVEIPNFPPLVFEFDEKLNWSEYYLGK 230 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6833> which encodes the amino acid 
35 sequence <SEQ ID 6834>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm --- Certainty=0 . 2546 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is 'shown below. 

45 Identities = 206/229 (89%) , Positives = 214/229 (92%) 



Query: 




MVKLVFARHGESEVOTHNLFTGVmDVDLSEKCTQQAinAGiaJIQaaGIEFDU^ 


60 






MVKLVFAia!6ESEWNKai!n.FTGWaDVDLSEKlGTQQAinAG^^ AGIEFDLRFTSVL R 




Sbjct: 


1 


M\naiVFARHGESEWNKAl!n^FTGWM>VDLSEKlt3TQQAinASKLIK^ 


60 


Query: 


61 


AIKTTNIALEAADQLWVPVEKSWRMffiRHXGGLTGKWKAEAaEQPGDEQVHIl^ 


120 






AIRITOLALE A QLWVP EKSWELNERHYG LT6KNK2VEAAEQF DEQVHIWRRSYDVL 




Sbjct: 


61 


AIKITOLALENAGQLWVPTEKSWRIJffiRHbrGSULTGKNKaEAftEQFC^^ 


120 


Query: 


121 


PPDMRKDDEHSAHTDRRYASIJJDSVIPIHffiNLKOTTiERALPPVffiDKlAPALKDGKN^ 


180 






PP MAKDDE+SAH DRRYA LD ++IPnfiENLKVTLERA+P+ME+KIAPAL DGKNVFVG 




Sbjct: 


121 


PPANIAKDDEYSAHKDRRYflDIiDPALIPDflEIsaKVTLERAMPYWEEKIAPAIiLDGKNVFVG 


180 


Query: 




AHGNSIRALVKHIKQLSDDEIMDVElPNFPPLVFEFDEKimVSEYYLG 229 
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SEQ ID 6832 (GBSllO) was expressed in E.coli as a His-fusion product SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 8; MW 28.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 10; MW 53.9kDa). 

The GBSllO-GST fusion product was purified (Figure 204, lane 5) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 252A), FACS (Figure 252B), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefid antigens for 
vaccines or diagnostics. 

Example 2214 

A DNA sequence (GBSx2333) was identified in S.agalactiae <SEQ ID 6835> which encodes the amino 
acid sequence <SEQ ID 6836>. This protein is predicted to be triosephosphate isomerase (tpiA). Analysis of 
this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

INTEEHAL Likelihood = -0.37 Transmembrane 36 - 52 ( 36 - 52) 

Final Results 

bacterial membrane — Certainty=0 .1150 (Affirmative) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC43268 OB:U07640 triosephosphate isomerase [Lactococcus 
lactis] 

Identities = 164/252 (65%) , Positives = 202/252 (80%) 

Query: 1 MSRKPFIAGasWKMKKNPEESUCaFIEAVASKLPSSELVEaGIAAPALTLSTVLEAJU^ 60 
MSRKP lAGNWKMNK EA+AF+EAV + I1PSS+ VE+ I APAL L+ + +GSEL 

Sbjct: 1 MSRKPIIACaNWKMSKTLSEAQAFVEAVKMSnjPSSDNVESVIGAPALFLAPMAYLRQGSE 60 

Query: 61 KIAAQNSYFENSGAFTGENSPKVLAEMGTDYWIGHSERRDyFHETDQDINKKAKAIFAN 120 

K+AA+NSYFEN+GAFTGENSP + ++G +Y++IGHSERR+YFHETD+DINKKAKaiFA 
Sbjct: 61 KLAAENSYFENAGAFTGENSPflAIVDLGIEYIIIGHSERREYFHETDEDINKKflKAlFAA 120 

Query: 121 GLTPIICCGESLETYEASKAVBFVGAQVSAALAGLSEEQVSSLVIAyEPlWAIGTaKaAT 180 

G TPI+CCGE+LET+ERI3K E+V Q+ A LaGL+ EQVS+LVIAYEPIMAIGTGK+AT 
Sbjct: 121 GATPILCOGETLETFEAGKTAEWVSGQIEAGLAGLTAEQVSNLVIAYEPIWAIGTGKTAT 180 

Query: 181 QDDAQNMCKAVRDVVaADFGQAVADKVRVQYCXSSVKPElSIVAEYMRCPDVDC^ 240 

+ A C VR V +G+ V++ VR+QYGGSVKPE + MA ++DGALVGQASLE 
Sbjct: 181 HEIADETCC3VVRSTVEKLTOKEVSEAVRIQYGGSVKPETIEGimKENIDGALVGGASLB 240 

Query: 241 AESFLALTiDFVK 252 

A+SFLALIi+ K 
Sbjct: 241 ADSPLALtEMYK 252 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6837> which encodes the amino acid 
sequence <SEQ ID 6838>. Analysis of this protein sequence reveals the following: 
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INTEGRAL Likelihood = -1.81 Transmembrane 36 - 52 ( 35 - 52) 

Final Results 

bacterial membrane Certainty=0 . 1723 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 220/251 (87%) , Positives = 237/251 (93%) 

MSRKPFIAGNWKMNKNPEEAKAFIEAVASKIPSSELVEAGIAAPALTLSTVLEAAKGSEL 6 0 
MSRKP IAGNWKMWKNP+EAKAF+EAVASKLPS++LV+ +AAPA+ L T +EAAK S L 
MSRKPI I^y3^WK^raKNPQEAKAFVEAVASKLPSTDLVDVAVAAPAOTLVTTIEAAKDSVL 6 0 

KIAAQNSYFENSGAFTGENSPKVXiAEMGTDYWIGHSERRDYFHETDQDINKKAKAIFAN 12C 
K+AAQNT YFEU+GAFTGE SPKVBAEMG DYWIGHSERRDYFHETD+DINKKAKAIFAN 



GLTPI+CCCSESLETYEAGKaVEFVGAQVSAALRGLS EQV+SLV+AYEPIWAIGTGKSAT 



QDDAQNMCKAVRDWAADFGQ VADKVRVQYGGSVKPENV +yMACPDVDGALVGGRSLE 



Based on this analysis, it was predicted that these proteins and their epitopes coxild be useful antigens for 
vaccines or diagnostics. 

Example 2215 

A DNA sequence (GBSx2334) was identified in S.agalactiae <SEQ ID 6839> which encodes the amino 
acid sequence <SEQ ED 6840>. Analysis of this protein sequence reveals the following: 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



0 N- terminal signal t 



Final Results 

bacterial cytoplasm Certainty=0. 3050 (Affirmative) < suco 

bacterial membrane — Certainty^^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in tiie GENPEPT database. 

>GP:AAB41198 6B:U75481 elongation factor-Tu [Streptococcus mutans] 
Identities = 44/45 (97%) , Positives = 45/45 (99%) 

Query: 1 MVMPGDNVTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSKIEA 45 

MVMPGDNVTI+VELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 
Sbjct: 117 MVMPGONVTIDVELIHPIAVEQGTTFSIREGGRTVGSGIVSSrEa 161 

There is also homology to SEQ ID 1022: 

Identities = 44/45 (97%) , Positives = 44/45 (97%) 

Query: 1 MVMPGDNVTIEVELIHPIAVEQGTTPSIREXSGRTVGSGIVSEIEA 45 

MVMPODNVTI VELIHPIAVEQGTTPSIRBGGRTVGSGIVSEIEA 
Sbjct: 371 MVMPODNVTINVELIHPIAVEQGTTFSIRBGGRTVGSGIVSEIEA 415 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens fol 
vaccines or diagnostics. 

Example 2216 

A DNA sequence (GBSx2335) was identified in S.agalactiae <SEQ ID 6841> which encodes the ammo 
acid sequence <SEQ ID 6842>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

INTEC3R2UJ Likelihood = -2.66 TransttieTrtorane 81 - 97 ( 80 - 97) 
INTEGRRL Likelihood = -2.60 Transmembrane 18 - 34 ( 17 - 34) 

Pinal Results 

bacterial membrane CertaintY=0 . 2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

• No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2217 

A DNA sequence (GBSx2336) was identified in S.agalactiae <SEQ ID 6843> which encodes the amino 
acid sequence <SEQ ID 6844>. Analysis of this protem sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0596 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty4=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S:pyogenes. 

Based on this analysis, it was predicted that this protem and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2218 

A DNA sequence (GBSx2337) was identified in S.agalactiae <SEQ ID 6845> which encodes the amino 
acid sequence <SEQ ID 6846>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3559 (Affirmative) c suco 

bacterial membrane Ceirtainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, coidd be useful antigens for 
vaccines or diagnostics. 

Example 2219 

A DNA sequence (GBSx2338) was identified in S.agalactiae <SEQ ID 6847> which encodes the amino 
acid sequence <SEQ ID 6848>. Analysis of this protein sequence reveals the following: 

Sf-term signal seq 

Final Results 

10 bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. GOOD (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AaF96286 GB:AE0a4374 hypothetical protein [Vibrio cholerae] 

Identities = 5S/167 (33%) , Positives = 89/167 (52%) , Gaps = 12/157 (7%) 

Query: IB LAIIKSLPIOTCVILCMTLRNFTOJKLS-GINBTLTC 73 
L + L L C++ AG +RN VW+ L + T +DIDV+PED + YE++ LE 

20 Sbjct: 41 LECVYQLBLPQCYIAAGFVFHLVWDSI.HHNVKIjTPIiNDIDVIFFDaDCLDSDYEKS--i:iE 98 

Query: 74 QQLKDNYPQYDWELKNEFYMNTHSENTPKXTSSKDAISKFPEKCTAVQARLDDRNQLELY 133 , 

+L + PQ +W++KN+ M+ + + P Y S+ DA+S +PEK TAV R + ++ E 
Sbjct: 99 LKLSEQMPQLNWQVKNQAKMHLQNGDNP-YQSTLDAMSYWPEKETAVAVRKVEHDRYECI 157 

25 

Query: 134 LPYGEEEILNFIVSPTPYFEEDLLRYNVYLXRVDKKKKMNIWPRLTI 180 

+G E + ++ P Y ++ RV K W +WP L I 

Sbjct: 158 SAFGFESLFQGFITHNP KRAYG1FENRVKSKGWLAI1WPNLRI 199 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2220 

A DNA sequence (GBSx2339) was identified in S.agalactiae <SEQ ID 6849> which encodes the amino 
35 acid sequence <SEQ ID 6850>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13060 GB:Z99110 yjdF ■ [Bacillus SUbtilis] 
Identities = 47/138 (34%) , Positives = 93/138 (67%) , Gaps = 2/138 (1%) 

MK+T+Y+DG FW+6++E D+G + FR+ PGKEP+D +V F++++L +++ + E + 
Sbjct: 24 MKLTIYYDGQFWGVVEVVDNGKIJRAFRHLPGKBPRDSEVI1EFVHNQLLNMMAQAE--QE 81 

Query: 61 DISLKRTNEHKXSPKRMQREINREKRKPVVSTKAQLaMKTIHMSIKMERQLSQKCKKNEL 120 

+ L+ + K +PKR+<3R++++E + V++KAQ A+K + K +++ K + 
Sbjct: 82 GVRIOSRRQKKINPKRIiQRQVSKELKNAGVTSKRQEAIKLEIiEflRKQKKKQIMKESR^ 141 
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Query: 121 RKHRYQLKQEKRYQKKKG 138 

++ RY LK++K +K +G 
Sbjct: 142 KEQRYMLKKQK&KKKHRG 159 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on liiis analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2221 

A DNA sequence (GBSx2340) was identified in S.agalactiae <SEQ ID 6851> which encodes the amino 
acid sequence <SEQ ID 6852>. This protein is predicted to be ComXl. Analysis of this protein sequence 
reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 



15 . Final Results 

bacterial cytoplasm Certaintyi=0. 3143 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9469> which encodes amino acid sequence <SEQ ID 9470> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 EEIiFDKVKPIVMKnRRNTFVQLWEYlOTlQEGRIVIiFRLLEEHPYLLDiraSKIiFIYFK^ S4 

+EI1+++V+ V K R Y++ LWE DW QEG + L L+ L+Dh- +L YFKTK 

Sbjct: 3 KELYEEVQGTVYKiaamXlHLVffiLSIMQEGMLCIiHELISREEGLVDDIPKLRK^ S2 

Query: 65 FSimJroVLRHQDCQKRQE^^(MPYEEISEVSHyVKSi(GLVLDDYIAYRDm 124 

F N + D +R Q+ QKR+++K PYEE+ E+SH + Gh lEm + +TL S 
Sbjct: 63 FRmiLDYIRKQESQKRRYDKKPYEEVGEISHRISEGGLWLDnYyLFHETLRDYENKQSK 122 

Query: 125 IDKEKFEKLISGERFAGKKQFIRDIQPFFNaF 156 

+E+ E+++S ERF G+++ +RD++ F F 
Sbjct: 123 EKQEELERVLSNERERGRQRVLRDLRIVFKEF 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6853> which encodes the amino acid 
sequence <SEQ ID 6854>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

ileavable N-term signal seq 

25 ( 7 - 28) 



- Final Results 

bacterial membrane Certainty^O. 5140 (Affirmative! 

bacterial outside Certainty^O.OOOC (Not Clear) • 

bacterial cytoplasm Certainty=O.OOOC (Not Clear) • 



A related sequence was also identified in GAS <SEQ ID 9163> which encodes the amino acid sequence 
<SEQ ID 9164>. Analysis of this protein sequence reveals the following: 



Possible site: 29 
> Seems to have an uncleavable N- 
IHTEGRAIi Likelihood —10.35 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
5 bacterial cytoplasm -- 



•- Certainty=0. 160 (Affirmative) < suco 
•- Certainty-0. 0000 (Not Clear) < suco 
•- Certainty=0. 0000 (Not Clear) suco 



The protein has homology with the following sequences in the databases: 



Query: 41 FEKVKPIILKLKRHYYIQLWDRDDWLCEGHIILLQLLERYPELIEEEERLYRYFKTKFSS 100 

+E+V+ + K + YY+ LW+ DW QEG + L +L+ R L+++ RL +YFICTKF + 
Sbjct: 6 YEEVQGTVYKCRNEYYLHLWELSDWDQEGMLCLHEJlilSRBEGLVDDIPRIiRKYFKTKFRN 65 

15 Query: 101 YLIODLLRRQESQKRQFHKIAYEEIGEVAHAIPSRGLWLDDYVAYQEVIASLENQLNSQER 160 

+ D +R+QESQKR++ K YEE+GE++H I GLWLDDY + E + N+ + +++ 
Sbjct: 66 RILDYIRKQESQKRRYDKEPYEEVGEISHRISEGGLWLDDYYLFHETTjRDYRNKQSKEKQ 125 

Query: 161 MQFQMiIRGERFRGREALLRKISPYFKEF 189 
20 + + ++ ERF+GR+ +1R + FKEF 

Sbjct: 126 EELERVLSNERFRGRQRVLRDIiRIVFKEF 154 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 78/149 (52%) , Positives = 116/149 (77%) 

25 

Query: 8 FDKVKPIVMKLRRNYFVQLWEYDDWIQEGRIVLFRtLEEHPYLLDNESKLFIYFKTKFSN 67 

F+KVKPI++KL+R+Y++QLW+ DDW+QEG I+L +I1LE +P L++ E +L+ YFKTKFS+ 
Sbjct: 41 FBKVKPI IIitCLKRHYYIQLWDRDDWLQEGHI ILLQTiTiERYPELIEEBBRLYRYFKTKFSS 100 

30 Query: 68 YI^NDVLRHQKQKRQFNKMPYEEISEVSHYVKSKGLVLDDYIAYRDTLTKVEETLSDIDK 127 

YL D+LR Q+ QKRQF+K+ YEEI EV+H + S+GL LDDY+AY++ + +E L+ ++ 
Sbjct: 101 YLKDIiRRQESQKRQFHKLAYESIGEViUmiPSRGimiDDYmYQEVIASLENQiaisQER 160 

Query: 128 EKFEKLISGERFAGKKQFIRDIQPFEKAF 156 
35 +F+ LI GERF G++ +R I P+F F 

Sbjct: ISl MQFQALIRGERFKGRRALLRKISPYFKEF 189 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



40 Example 2222 

A DNA sequence (GBSx2341) was identified m S.agalactiae <SBQ ID 6855> which encodes the amino 
acid sequence <SEQ ID 6856>. Analysis of this protem sequence reveals the following: 

Possible site: 57 

>» Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -2.23 Transmembrane 166 - 182 ( 156 - 1B2) 

Final Results 

bacterial membrane Certainty=0 .1893 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suoo> 

50 bacterial cytoplasm Certainty=0.0000(Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA99510 GB:Z75191 ORF YOR283W [Saccharomyces cerevisiae] 
Identities = 57/226 (25%), Positives = 97/225 (42%), Gaps = 22/226 (9%) 

55 

Query: 4 VRLYIARHGKTMENTIGRAQGWSDTPLTTFGSLGIKELGLGLKaSNISFKEAFSSDSGRT 63 

+RL+I RHG+T N QG DT + GE +LG L++ I F + SSD R 

Sbjct: 17 IRLFIIRHGQTEHNVKKILQGHKDTSINPTGEEQATKLGHYLRSRGIHFDKVVSSDIjKRC 76 
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Query: 64 LQTMEIILREVQQENIPYTRDKRIREWCFGSLDGGTOGDLFKGVLPRVSNGDMSHLTHEE 123 

QT ++L+ +QEN+P + +RB G ++G M E+ 

Sbjct: 77 RQTTAIiVLKHSKQEKVPTSYTSGIiRBRyMGVIEXS MQITEAEK 118 

Query: 124 IMILICQVDTAGWAEPWAILSNRILSGFTAIAKKIEDIGGGNArWSHGMTIATFL-WL- 181 

A++ +E R+ ++GN +VSH6 I L WL 

Sbjct: 119 YADKHGEGSFRNFGEKSDDFVTVRLTGCVEEEVREaSNEGVrarajALVSHGGAIRMILQWLK 178 

Query: 182 IDHSTPRSLGLDNGSVSWDF— EDSTFSIQSIGDMSYREKGREIL 225 

++ + + N SV++VD+ + F ++ +G+ + G ++ 

Sbjct: 179 YEiraQAHKIIVElWSOTIVriYVKDSKQFIVRRVGISmSHLGDGEFVV 224 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6857> which encodes the amino acid 
sequence <SEQ ID 6858>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

;erminal signal sequence 

insmembranfi 170 - 186 ( 170 - 186) 

20 ' Final Results 

bacterial membrane — Certainty=0 .1277 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

=■ YOR283W [Saccharomyces cerevisiae] 
, Positives = 98/231 (41%) , Gaps = 27/231 (11%) 

Query: 5 RLYIARHGKTMENTIGRAQGWSDTPLTKKGEEGIRELGLGLKDATIPFKAAFSSDSGRTM 64 

RL+I RHG+T N QG DT + GEE +LG L+ IF SSD R 

Sbjct: 18 RLFIIRHGQTEHmnHCI^3GHKDTSINPTGEEQATKLGHYLRSRGI•HFDKVVSSDLKRCR 77 

(Juery: 65 QTIEIILRESENEPLPYTKDNRIREWCFGSLEGTYDSELFLGVLPRTKRFENRDNLRDVP 124 

QfT ++L+ S+ E +P + ^ +RE G +EG 1-E 
Sbjct: 78 QTTALVLKHSKQKNVPTSYTSGLRERYMGVIEGMQITEA 116 

Query: 125 YSEIAESIVEVDTAMWAEPWEVmKRI^ffiGFEAlALSIQ^CVGG{3^IRLWSHG^CT^ 183 

+ A+ E N+ E + R+ E NGN +VSHG I L 

Sbjct: 117 -EKYADKHGEGSFRNFGEKSDDFVARLTGCVEEEVAEASNEGVKNLALVSHGGAIRMILQ 175 

Query: 184 WL--IDPDRDKQYIDNGSVTWEF--DDGQFTIKTIGDMSXRYRGREIIEE 230 

WL + K + N SVT+V++ D QF ++ +G+ + G ++ + 
Sbjct: 176 WLKyENHQRHKIIVFNTSVTIVDYVKDSKQFIVRRVGNTQffljGDGEFWSD 226 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 150/231 (64%) , Positives = 182/231 (77%) , Gaps = 5/231 (2%) 

Query: 1 MSKVRLYIARHGKTMFNTIGRAQGWSDTPLTTFGELGIKELGLGLKASNISFKEAFSSDS 60 

M+K RIYIARHGKTMFNTIGRAQGWSDTPLT GE GI+ELGLGLK + I FK AFSSDS 
Sbjct: 1 MTKTRLYIARHGKTMFNTIGRAQGWSDTPLTKKGEEGIRELGLGLKDATIPFKaAFSSDS 60 

(Juery: 61 GRTLQTMEIILREVQQENIPYTRDKRIREWCFGSLDGGYDGDLENSVLPRV SNGDM 116 

GRT+QT+EIILRE + E +PYT+D RIREWCFGSL+G YD +IiF GVLPR + ++ 

Sbjct: 61 GRTMQTIEIILRESENEFLPYTKDNRIREWCFGSLEGTYDSELFLGVLPRTKAFENRDNL 120 

Query: 117 SHLTHEEIANLICQVDTAGMAEPWAIIiSNRILSGFTAIAKKIEDIGGGNAIWSHGMTIA 176 

+ + E+A I +VDTA WKEPW +L RI GF AIA 1++ GGGNA+WSHGMTI 
Sbjct: 121 RDVPYSELAESIVEVDTAIlIWaEPWEVIJ«alIWEGFEAIALSIQ^IAGGG^SC^LWSHGMTIG 180 

Query: 177 TFLWLIDHSTPRSLGLDNGSVSWDFEDGTFSIQSIGDMSYREKGREILEK 227 

TFLWLID + +DHGSV+W+F+DG F+I++IGDMSYR +GREI+B+ 
Sbjct: 181 TFLWLIDPDRDEQY-inNGSVTWEFDDGQFTIKTIGDMSYRYRGREIIEE 230 
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A related GBS gene <SEQ ID 8999> and protein <SEQ ID 9000> were also identified. Analysis of this 

protein sequence reveals the following: 

Cytoplasmic predicted but experimentally found on the surface of Streptococci 
5 32.3/52.0% over 184aa 

Hiermotoga marititna 

EGM3|l65S3l| phosphoglycerate mutase Insert characterized 

GP|4981935|gb|AaD36444.l|AE001791_6|aEOG1791 phosphoglycerate mutase Insert characterized 
PIr|G72260|G7226Q phosphoglycerate mutase - (strain MSB8) Insert characterized 

10 

ORF01265(26a - 870 of 1248) 

EGAD 1165681 1 TM1374(1 - 185 of 201) phosphoglycerate mutase {Hiermotoga maritima) 
GP|4981935|gb|AAD36444.l|AE001791_6|AE001791 phosphoglycerate mutase {Thermotoga maritima} 
PIR|G72260|G72260 phosphoglycerate mutaae - Hiermotoga maritima (strain MSB8) 
15 %Match = 6.3 

%Identity =32.2 %Similarity =52.0 

Matches = 57 Mismatches = 78 Conservative Sub.s = 35 

105 135 165 195 225 255 285 315 

20 RGRiraSYEIFHPFSMLLKRimFYFCSR*LQNPFIGKA«*YIPVKAFVPCXraiKCL*GVSMSKVRL^ 

:=ll: lll=|::| 
MKLYLiXRHGETimEK 
10 

25 345 375 405 435 465 495 519 549 

GRJVQGWSDTPriTTFGELGlKEI/3IiGriKASNISFKEAFSSDSXRTIjQTMEIiriREVCK5ENI--PYT^ 
I II :| II I ::| II HI hhl I I 1 = = 1 I hit 

GLWQGVTDVPUJERQREQaRiaANSIjK RVDAIYSSPLKRSLETAEEIAKRFEKEIIVEEDLRECEISLW 

30 40 50 60 70 80 

30 

579 609 639 669 699 729 759 

GYDGDIiFNGVLPRVSNGDMSHLTHEEISNLICQVDTAGWa EPMAILSHRILSGFTAIAKKIEDIGGGNAI 

: II II I 1= b I : ll=: I : : I I = 

HGLTVEE-AIREYPVEFKKWSSDPNFGMEGLESMRNVQNRWKAIMKIVSQEKLNGSENW 

35 90 100 110 120 130 140 

789 816 840 870 900 930 950 990 

WSHaMTIATFL-WLIDHST--PRSLGIJ3NGSVSVVDFEDGTFSIQSIGDMSYREKGREILEKILQ*KKIKLSDSV*LVF 
:||1 1= 1== 1=: 111 |:|l|: I = =1 

40 IVSHSLSLRAFICWILGIiPLYLHHNFKLDNASLSVVEIESKPELVIjIiNDT(^ 

160 170 180 190 200 

SEQ ID 9000 (GBS44) was expressed in E.coH as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 6; MW 27kDa), in Figure 168 (lane 8-10; MW 42kDa - thioredoxin 
45 fusion) and in Figure 238 (lane 7; MW 42kDa). It was also expressed in E.coU as a GST-fusion product. 
SDS-PAGE analysis of total cell ejctract is shown in Figure 12 (lane 8; MW 52.4kDa). 

Purified Thio-GBS44-His is shown in Figure 244, lanes 7 & 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 2223 

A DNA sequence (GBSx2342) was identified in S.agalactiae <SEQ ID 6859> which encodes the amino 
acid sequence <SEQ ID 6860>. This protein is predicted to be d-alanyl-d-alanine carboxypeptidase. 

Analysis of this protein sequence reveals the following: 

Possible site: 27 
55 >>> Seems to have a cleavable N-term signal seq. 

Pinal^ Results 
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bacterial outside Certainty=0. 3000 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 



Query: 79 ELSI>DVVPVE(Iiyi£)KRITKQRTQFLEAARAIDSREHIjISSYRSVaYQEKLFNSyVTQEM 133 

E4-4-PDV ++ + 4-D RI + +FL AA+ IDS EHI.ISGYRSVa.YQE+L+N4-Y+ QE 
Sbjct: 4 EMNPDWDIDGVKOTSRIAENTRKFLAJ^EIDSSEHLISGYRSVftYQEELTOiyiAQEK 63 

Query: 139 TSNE^mTRGQAEKLVECTYSQPAaASEHQTC3La^roMSTVDSUffiSDPRWSQLKKIAEQY 198 

+NP+L++ +A+K V+TYSQP G+SEHQTGLA+DMSTVDSLN+SD W+++ lAP+YG 
Sbjct: 64 ANNPSLSQEEAQKQVQTYSQPPGSSEHQTGLAIDMSTVDSLNQSDANWAKVAAIAPKYG 123 

Query: 199 FVLRFPDGKTAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKE 247 

FVLRFP+GK TG+ YEDWHYRYVGV+SAKYM KH LTriEEy+ LKE 
Sbjct: 124 FVIKFPEGKKiaTGIDSEDWHYRYVOVKSaKJOTKHDLTLEEYLKKLKE 172 

A related DNA sequence was identified in S.pyogenes <SEQ ID 686 1> which encodes the amino acid 
sequence <SEQ ID 6862>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 26 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.66 Transmenbrane 10 - 26 ( 3 - 29) 

Final Results 

bacterial membrane Certainty=0 . 4B64 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 <Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



Query: 74 IT!CEMSPELM)INGISVDKRIEQATSDPLAAAQAIDLQEHLISGYRSVDYQTELYQSYIK 133 

IT EM+P++ DI+G+ VD RI + T FLAAAQ ID EHLISGYRSV YQ ELY +YI 
Sbjct: 1 ITAEMNPDVTDIDGVKVDSRlAENTRKFLaflAQEIDSSEHLISGYESVAYQEELYNNYIA 50 

Query: 134 KEMANDPTLTQEAAEALVQTYSQPPGASEHHTGLAIDMSTVDTLNASDPSVAKAVQKIAP 193 

+E AN+P+L+QE A+ VQTYSQPPG+SEH TGLAIDMSTVD+LN SD +V V lAP 
Sbjct: 61 QEKANNPSLSQEEAQKQVQTYSQPPGSSEHQTGLAIDMSTVDSLNQSDANWAKVAAIAP 120 

Query: 194 DYGFVLRFPEGKKTSTGVDYEDWHYRYVGKASARYMAQHNLTLEEYIAALKEK 246 

YGFVERFPEGKK +TG+DYEDWHYRYVa SA+YM +H+MLEEY+ LKEK 
Sbjct: 121 KYGBTOJlFPEGKKmTGIDYEDWHYRyVGVKSaKYMTKHDLTLEEyLKKLKEK 173 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/235 (55%) , Positives = 172/235 (72%) , Gaps = 3/235 (1%) 

Query: 15 LLAILCF--SLFALLECPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVN 72 

LL ++ F L+ +KP + +Q L ++I++ +K ++ LP VS +DW L+LW 
Sbjct: 12 LLIVIVFLGGLYLFIKPEESWPTQ-IiJKKEIQQKDIKKrDRIBALPKySVEDM^ 70 

Query: 73 RDHKHEELSPDWPVENIYLDKRITKQATQFLEaARAIDaREHLISQYRSVAYQEKLENS 132 

RDH +E+SP++ + I +DKRI + + FL AA+AID 4-EHLISGYRSV YQ +L+ S 
Sbjct: 71 RDHITKEMSPELADINGISVDKRIEQATSDFLAAAQAIDLQEHLISGYRSVDYQTELYQS 130 

Query: 133 YVTQEOTSNEtniTRGQAEKLWTYSQPAGASEHQTGIAMDMSTVDSXJSIESDPRVVSQLKK 192 

Y+ +EM ++P LT+ AE LV+TYSQP GASEH TGLA+DMST\7D+LN SDP V ++K 
Sbjct: 131 YIKKEMaHDPTLTQEAAEmVQTYSQPPGASEHHTGIAIDMaTVDTLNASDPSVAKAVQK 190 

Query: 193 IAEQYGFVLRFPIX3KTAETGVGYEDWHYRYVGVESAK5rMAKHHLTLEEYITLLKE 247 
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lAP YGFVLRFP+GK TGV YEDWHYRYVG Sfi+YMA+H+LTLEEYI LKE 
Sbjct: 191 lAPDYGFVLRFPEGKKTSTGVDYEDWHYRYVGKASaRYMaQHNLTLEEYlAALKE 245 

A related GBS gene <SEQ ID 9001> and protein <SEQ ID 9002> were also identified. Analysis of thi 
protein sequence reveals the following: 

Lipop; Possible site: -1 Crend: 7 
McG: Discrim Score: 14.03 
GvH: Signal Score (-7.5): -1.02 

Possible site: 27 
»> Seems to have a cleavable N-term signal seq. 
ALiOM program count: 0 value: 10.08 threshold: 0.0 
PERIPHERAL Likelihood = 10.08 56 
modified ALOM score: -2.52 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

33.7/55.1% over 183aa 

Enteroccccus faecalis 

EGAD|41322| d-alanyl-d-alanine carboxypeptidase Insert characterized 
GP| 1209528 |gb|AAB05624. l| 1U35369 D,D-carboxypeptidase insert characterized 

ORF01266{484 - 1038 of 1350) 

EGAD141322 j43646 (85 - 268 of 268) d-alanyl-d-alanine carboxypeptidase {Enterococcus 
faecalis}SP|Q47746|VBl!IY_ENTFA D-MANYL-D-ALANINE 

CARBOXYPEPTIDASE (EC 3.4.16.4) (DD-PEPTIDASE) (DD- 

CARBOXYPEPTIDASE) .GP| 1209528 |gb|AAB05624.l| |U353e9 D, D- carboxypeptidase {Enteroco 
ecus faecalis} 
%Match =10.1 

%Identity =33.7 %Similarity =55.1 

Matches = 63 Mismatches = 79 Conservative Sub.s = 40 

234 264 294 324 354 384 414 444 

SR*F*RWNIFYSIYWGYVLSRKiaa?NFRKNIAMKKNKIIRFSLVGVLLAILCFSLERLLKPNSQQSSSQKLE[iE 

MEKSNYHSNVNHHKRHMKQSGEKRAFLWAFI I SFTVCTLFLGWRLVSVLEATQLPPIPATHTGSGTGVREN 



mill 



EKLFNSYOTQEMTSNPNLTRGQAEKLVKTYSQPAGASEHQTGLAmMSTVDSLNESDPRWSQLKKIAPQYGPVLRFPDG 
::: : II : I 11= =1= I Mil I I I = I •• = I •- •- : I I : = ::||: |:| 
QEIMDEKV-AEYKAK-GYTSAQAJKaSftETWVAVPGTSEHQLGLAVDINA-DGIHSTGI^^ 

160 170 180 190 200 210 220 

948 978 1008 1038 1068 1098 1128 1158 

KTAETGVGYEDVraYRYVGVESAKYhaKHHLTLEEYITLLKENISIQ*G]WFPC*ILLLLLLFSFSLPPFRF* 
II III I 1111111=1=1 : : I 1111= I 
KTEITGVSNEPWHYRYVGIEAATKIYHQGLCLEEYLNTBK 



SEQ ID 6860 (GBS 18) was expressed in E.coU as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 3; MW 31kDa). 
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The.GBS18-His fusion product was purified (Figure 93A; see also Figui-e 189, lane 11) and used to 
immunise mice (lane 2 product; 2,0jig/inouse). The resulting antiserum was used for Westem blot (Figure 
93B), FACS (Figure 93C ), and in the in vivo passive protection assay (Table III). These tests confirm that 
the protein is iimnunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Example 2224 

A DNA sequence (GBSx2343) was identified in S.agalactiae <SEQ ID 6863> which encodes the amino 
acid sequence <SEQ ID 6864>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 34 

>» Seems to have an uncleavable N-term signal seq 

Likelihood =-12.58 Transmerttorane 10 - 26 ( 3 - 29) 



Final Results 

bacterial membrane --- Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certair.ty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6865> which encodes the £ 
sequence <SEQ ID 6866>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.83 Transmenibrane 10 - 26 ( 4 - 33) 

Final Results 

bacterial membrane Certaintyi=0. 5734 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AM300279 GB:U78599 putative N-acetyl-rauramidase [Streptococcus mutans] 
Identities = 66/150 (44%) , Positives = 97/150 (64%) , Gaps = 5/150 (3%) 

Query: 18 LLLIVCPLLSSQRIASADKEVRVHYSQI^FITKMGKEVKPLAKYYBIRPSILIAQILLET 77 

LL+I+ P+Hri-S +A A+K++ YS K+P+ ++ + L-l-K YG+R SI+I Q L++ 
Sbjct: 3 LLVILLPILASGGLAIIANKKMPSPYSHKEFVKBIAPTAQKLSKIYGVRSSIIIGQAALDS 62 

Query: 78 HDGKTLLASKYI^^.FSKKATPGQ^mITLKSPKQra---QNV--RYAIYK□DASAIRDYLR 132 

H G TLLASKXHNLFS +A+PGQ A+ LKS + N Q V RY +Y+ ++ DY+ 
Sbjct: 63 HFGSTLLASKYHNLFSIEASPGQGAVRLKSHEYKNGRMQEnmjRYLVYESWKESLYDYMA 122 

Query: 133 MLROGKEVDRRLYRNLATEKGYKAPAKSLQ 162 

+L K DK LY + T GYK A++LQ 
Sbjct: 123 ILHGraCIWDKALYTTMMTSSGYKTVARALQ 152 

An aligmnent of the GAS and GBS proteins is shown below. 

, Identities = 67/190 (35%) , Positives = 102/190 (53%) , Gaps = 1/190 (0%) 

Query: 1 MRKRFSLLNFlWTFIFFFFILPPLLNHKGKVDANSRQSVTyTKEEFIQKIVPDAQDLGK 60 

MRKR F+ + F 14 PLL+ + A+ V Y++++FI K+ + + L K 

Sbjct: 1 MRKRLKFPYFLTLLACFLLLIVCPLLSSQRIASADKEVE™"SQKQFITKMGKEVKPLAK 60 

Query: 61 SYGIRPSFIIAQAALDSDFGEKILM<IKYHNLFGLIAEPGTPSITLNDSSTGKKQEKQFTH 120 

YGIRPS +IAQ L++ G+ +LA+KYHNLF A PG -t-ITL S Q ++ 

Sbjct: 61 YYGIRPSILIAQILLETHDGKTLLASKYHNLFSKKATPGQVAITLK-SPKQTNQNVRYAl 119 
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Sbjct: 120 YKDDASAira^ljRMLRQGKETOKaj^nattiftTEKGYKA^ 113 

Query: 181 IDLYDLTRID 190 

1+ DLT IfD 
Sbjct: 180 lESMDLTNXD 189 

SEQ ID 6864 (GBS246) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 61 0ane 7; MW 24.6kDa). 

GBS246d was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 154 (lanes 14 & 15; MW 21kDa) and in Figure 183 (lane 4; MW 21kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
187 (lane 12; MW 46kDa). Purified GBS246d-GST is shown in Figure 243, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2225 

A DNA sequence (GBSx2344) was identified in S.agalactiae <SEQ ID 6867> which encodes the amino 
acid sequence <^EQ ID 6868>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2541 (Affirmative) < succ: 

bacterial meitibrane — Certainty=0 ,0000 (Not Clear) < succi 
bacterial outside — Certainty=0. 0000 (Hot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:flAC456aO GB:U78296 repressor of class 1 heat shock gene 
expression HrcA [Streptococcus mutans] 
Identities = 227/345 (65%) , Positives = 287/345 (82%) , Gaps = 1/345 (0%) 



++II1DVEP +QRIiT FD+-V+IiSNHDRLA.V+TLDE+ PVTVQPAIP+NFIi DL+ 



ER L+ +V+DIHy+URTE PQI+(2KyP TDNVL LFD++F+ +F E VF++GK+ +I1 



\- L TYQFL-l-H Q VA+ +RQSL E E+ VQVADS+E +LaD++V++ KFLIPyRGFG 



Query: 


17 


Sbjct: 


1 






Sbjct: 


SI 


Query: 


136 


Sbjct: 


121 


Query: 


196 


Sbjct: 


181 


Query: 


256 


Sbjct: 


241 




3ie 


Sbjct: 


301 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 6869> which encodes the amino acid 
sequence <SEQ ED 6870>. Analysis of this protein sequemce reveals the following: 
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Possible site: 28 ' 

>» Seems to have no N-tezminal signal sequence 

Final Results 

5 bacterial cytoplasm --- Certainty=0 .0695 (Affirmative) ■ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < i 
bacterial outsicie . — Certainty=0. 0000 (Not Clear) < i 

An alignment of flie GAS and GBS proteins is shown below. 

10 Identities = 341/344 (99%) , Positives = 343/344 (99%) 



Query: 17 VITQRQNDILNLIVELFTQTHEEVGSKALQRTIDSSSATIBNDMAKLEKLGLLEKAHTSS 76 
Sbjct: 

Query: 77 GRMPSPAGFKYFVEHSLRr£)SIDEQDIYHVIKAFDEEaFKLEDMLQK2\SHlLSEMTG5rTS 136 

GRMPSPAGFKYFVEHSLRLDSIDEQDlYIOTKRFDFEaFKIiEBMLQKRSHIL+EMTCOTS 
Sbjct: 61 GRMPSPAGFKYFVEHSLRLDSIDEQDIYEIVIKRFDEEAFKLEDMLQKASHILAEMTGYTS 120 

Query: 137 VILDVEPARQRIjTGFDWQIjSNHDMiAVrm^DESKPVTVQFAIPRNELTRDI.IAFKAITO 196 

VILDVEPARQRLTGFDWQLSNHDALAVMTLDESKPVTVQPAIPRNFLTRDLIAFKAIVE 
Sbjct: 121 VILDVEPARQRLTGFDWQLSNHDALAVMTLDESKPVTVQFAIPRNFLTRDLIAFKA.IVE 180 

Query: 197 ERLLDGSVlvDIHYKLRTEIPQIVQKYFVTTDNVLQLFDWFSELFLETVFXrAGKVNSLTY 256 

ERLLD SV+DIHYKLRTEIPQIVQKYFVTTDNVLQLFDyVFSELFLETVF\rAGKVNSLTY 
Sbjct: 181 ERLLDNSVIDIHYKLRTEIPQIVQKYFVTTDNVLQLFDYVFSELFLETVFVAGKVNSLTY 240 

Query: 257 SDLSTYQFLnNEQQVAISLRQSLKEGEMASVQVADSQEAALADVSVLTHKFLIPYRGPGL 316 

SDLSTYQFt£(NEQQVAISLRQSLKEGEMASVQVADSQEaaLaDVSVLTHKFLIPYRGPGL 
Sbjct: 241 SDLSTYQFj™EQQVAISLRQSIiKEGEMASVQVaDSQEA2aJ^SVKCHKFLIPYRGPQL 300 

Query: 317 LSLIGPIDMDTOlSVSLVNIIGKVIAAKLGDYyRraNSNHYEVH 360 

LSLIGPIDMimiRSVSLVNIIGKVLAAKLGDYYRmNSNHYEVH 
Sbjct: 301 LSLIGPIDMDYRRSVSLWIIGKOTAAKLGDYYRYONSNHTEVH 344 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2226 

A DNA sequence (GBSx2345) was identified in S.agalactiae <SEQ ID 687 1> which encodes the amino 
40 acid sequence <SEQ ID 6872>. This protem is predicted to be grpe protein (grpE). Analysis of this protein 
sequence reveals the followmg: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm --- Certainty^0.5138(Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45611 GB:U78296 GrpE [Streptococcus mutans] 
Identities = 130/180 (72%) , Positives = 151/180 (83%) , Gaps = 3/180 (1%) 

Query: 14 VSEEIKKDDLQEEVEATE — TEETVEEVIEEIPEKSELELANERADEFENKXIiRftHaEM- 70 
55 +S++ KK++ +EEVEATE TEE+VEEV BE E EL+ A ERA++FENKYIiRSHaEM 

Sbjct: 1 MSKKDKKEEYKEEVEATEPTTEESVEEVAEETSENKELQEAliERAEDFENKILRAHAEMP 60 

Query: 71 QKIQRRSSEERQQLQRYRSQDLAKAILPSriDNLERAIAVEGLTDDVKKGLEMTRDSLIQR 130 
+ + + QRYRSQDL KAIIiPSLDNLERALAVEGLTDDVKKGLEM ++SLIQA 

60 Sbjct: 61 KTFSVATMKSDKVCQRYRSQDLRKAILPSLDNtERAIAVEGLTDIWKKGLEMVQESLIQA 120 
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Query: 131 LKEEGVESWEVDSFDHNPHMRVQTtPMDEHPMSIAEOTQKBYKLHERLLRPMVV^ 190 

LKEEGVEEVE+++FD N HMAVQTL ADD+HPADSIA+V QKGY+LHERLLRPAMVWYN 
Sbjct: 121 LKEEGVEEVELENFDANLHMRVQTLDMDDHPMSIAQVHQKGYQmERLLRPJWIVVV™ 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6873> which encodes the amino acid 
sequence <SEQ ID 6874>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5138 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 189/190 (99%) , Positives = 189/190 (99%) 



Query: 1 MAVFNKLFKRRHSVSEEIKKDDLQGEVEATETGETVEEVIEBIPEKSBIiEIiAHERADEFE 60 

MAVENKLFKRRHSVSEEIKKDDLQEEVEATETEETVEEVIEE PEKSELEIAHERADEFE 
Sbjct: 1 MAVEraCr.FKRRHSVSEEIKKDDIiQEEVEaTETEETVEEVIEETEEKSEI,ELaMERSDEEE 60 

Query: 61 NKXLRftHAEMQNIQRRSSEERQQLQRYRSQDrMCAILPSI^LERMi^^ 120 

WKXIiRAHJiEMCJNIQRRSSEHJQQLQRYRSQDLaKAILPSLDOT^ERaiiaVEGLTDOT^ 
Sbjct: 61 NXYLRAHIffiMQNXQRRSSEERQQLQRYRSQDLAKAILPSIiDNLERAIAVEGLTDDVKKGIi 120 

Query: 121 EiyrrRDSLIQai.KEEX3VEEVEOTSFDHNFHiyiaVQTLPADDEHPADSIAEV^ 180 

E^raa3SLIQaLKEEGVEEVEVDSFDHNEBMaVQTLPADDEHPADSIfiEVFQKBYKIJ^ERL 
Sbjct: 121 EMTRDSLIQALKEEGVEEVEVDSPDHNFHMAVQTLPADDEHPADSIAEVFQKGYKLHERL 180 

Query: 181 LRPAMVWYN 190 

LRPAMWVYN 
Sbjct: 181 LRPAMVWYN 190 

Based on this analysis, it was predicted that these protems and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2227 

A DNA sequence (GBSx2346) was identified in S.agalactiae <SEQ ID 6875> which encodes the amino 
acid sequence <SEQ ID 6876>. This protein is predicted to be heat shock protein 70 (dnaK). Analysis of 
40 this protein sequence reveals the following: 
Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 0996 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6877> which encodes the amino acid 
50 sequence <SEQ ID 6878>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 0996 (Affirmative) < suco 

bacterial raeitibrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 594/S09 (97%) , Positives = 603/S09 (98%) ; 'Gape = 1/609 (0%) 

Query: 1 MSKIIGIDLGTTNSAVAVLESTESKIIANPEGNRTTPSWSFKNGEIIVGDAAKRQAVTN 60 ' 

MSKIIGIDLGTTNSAVAVLEGTESKIIANPEGNRTTPSWSFKNGEIIVGDAAKRQAVTN 
Sbjct: 1 MSKIIGIDICT'aNSAVAVLESTESKIIflNPEGNRTTPSVVSFKNGEIIVGDAaKRQAVTN 60 

Query: 61 PDWISIKSKMGTSEKVSANGKEYTPQEISAMILQYLRGYftEDYLGEKVEKAVITVPAYP 120 

P+TVISIKSKMGTSEKVSANGKEYTPQBISAMILQYLKGYAEDyijGEKVEKAVITVPAYF 
Sbjct: 61 PETVISIKSKMGTSEKVaMGKEYTPQEISAMILQYLKGYAEDYLGEKVEKAVITVPAYF 120 

Query: 121 HDAQRQATKDAGKIAGLEVERIVNEPTAAAIAYGMDKTDKDEKILYFDLGGGTFDVSILE 180 

NDAQRQATKDAGKIAGLEVERIVNEPTAAALAYGMDKTDKDEKILVFDLGGGTFDVSILE 
Sbjct: 121 NDAQRQATKDAGKIAGLEVERIVNEPTAAALAYGMDKTDKDEKILVFDLGGGTFDVSILE 180 

Query: 181 LGDGVFDVLaTAGDNKLGGDDFDQKIIDFLVEEFKKENGIDLSQDKMALQRLKDAAEKAK 240 

LGDGVFDVLftTAGDNKLGGDDFDQKIIDFLV EFKKENGIDLSQDKMALQRLKDAAEKAK 
Sbjct: 181 LGDGWDVIATAGDNKLGGDDPDQKIIDFLVaEFKKENGIDLSQDKMALQRLKDAMKAK 240 

Query: 241 KDLSGVTQTQISLPFlTAGSAGPLHLEMSLSRAKFDDLTRDLVERTKrPVRQflLSDAGLS 300 

KDLSGVTQTOISLPFITAGSAGPUmEMSLSRAKPDDLTRDLVERTKTPVRQALSDAGL^ 
Sbjct: 241 KDLSGVTQTQISLPFITAGSAGPLHLEMSLSRAKFDDLTRDLVERTRrPVRQaLSnaGLS 300 

Query: 301 LSEIDEVILVGGSTRIPAVVBAVKRETGKEPNKSVNPDEVVAMGAAIQGGVITGDVKDW 360 

LSEIDEVILVGGSTRIPAWEAVKAETGKEPNKSVNPDEWAMGRAIQGGVITGDVKDW 
Sbjct: 301 LSEIDEVILVGGSTRIPAWEAVKAETGKEPNKSVNPDEWAMGAAIQGGVITGDVKDW 360 

Query: 361 LLDVTPLSLGIETMGGVFTKLIDRNTTIPTSKSQVFSTAADNQPAVDIHVLQGERPMAAD 420 

LLDVTPLSLGIETMGGVPTKLIDRNTTIPTSKSQVFSTAADNQPAVDIHVLQGERPMAAD 
Sbjct: 361 LLDVTPLSLGIETMGGVFTKLIDRNTTIPTSKSQVFSTAADNQPAVDIHVLQGERPMAAD 420 

Query: 421 NKTLGRFQLTDIPAAPRGIPQIEVTFDIDKNGIVSVKaKDLGTQKEQHIVIQSMSGLTDE 480 

NKTLGRFQLTDIPAAPRGIPQrEVTFDIDKKGIVSVKaKDLGTQKEQHIVI+SN GL++E 
Sbjct: 421 NKTLGRFQLTDIPAAPRGIPQIEVTFDIDKNGIVSVKaKDLGTQKEQHIVIKSNDGLSEE 480 

Query: 481 EIDKMMKDaEANaEaDAKRKEEVDLKNEVDQAIFATEKTIKETEGKBFDTBRK^ 540 

EID+l!mKnaEaKaEflIJAIa^KEEVDLK^IEVDQAIFATECTIKETEGKGFI^ 
Sbjct: 481 EIDRmKnAEaNaEADAlOlKEEVDLKNEVDQAIFATEKTIKETEGKBFD^ 540 

Query: 541 ELKKftQESGNLDDMKAKLEALNEKaQaiAVKLYEQaaAAQQAAQGftEGftQSADSSSK^^ 600 

ELK AQESGISILDDMKAK]:iEAIiNEKAQAIiAVK+YEQAAAAQQ%AQGAEGAQ+ DS++ DD 
Sbjct: 541 ELKAAQESGa!n:DDMKAKLE2piEKAQAIAVmYEQAA&AQQ%AQQ2VEGA 599 

Query: 601 WDGEFTEK 609 

WDGEFTEK 
Sbjct: 600 WDGEFTEK 608 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2228 

A DNA sequence (GBSx2347) was identified in S.agalactiae <SEQ ID 6879> which encodes the amino 
acid sequence <SEQ ID 6880>. This protein is predicted to be Streptococcus pneumoniae DnaJ protein 
homologue (dnaJ). Analysis of this protein sequence reveals the following: 
Possible site: 18 

■»> Seetns to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4180 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certaintyi=0.0000(Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 688 1> which encodes the s 
sequence <SEQ ID 6882>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1322 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 330/377 (87%), Positives = 357/377 (94%), Gaps = 1/377 (0%) 



Query: 1 MNNTEBYDRLGVSKDASQDEIKKAYRRMSKKyHPDINKETGAEEKYKEVQEAYETLSDTQ 60 

MNNTE+YDRLGVSKDASQD+IKKAYR+MSKKYHPDINKE GAE+KYK+VQEAYETLSD+Q 
Sbjct: 19 MNNTEYYDRLGVSKDASQDDIKKAYRKMSKKYHPDINKEAGAEQKYKDVQEAYETLSDSQ 78 



79 KRaAYDQYG3»GAQGGFG6-GSVGGFGGEDGGGFGGFEDIFSSFR3GGGSRNENAPRQGDD 137 



Sbjot: 

Sbjct: 

Sbjct 

Sbjct 
Query; 
Sbjct 



:pagvetgqqirltgqg 240 
plg mrrqvtcd+c gsg+eike c tchgtghek+ hkvsvkipagvetgqqirl g(3g 
193 plgmmrrqvtcdichgsgkeikepcqtchgtghekqahkvsvkipagvetgqqirlqgqg 257 

241 eagfnggpygdlfviinvlpsqqferngstiyytijjisfvqaalgdtidiptvhgavems 300 

eagf'nggpygdiifvi+nvlps+qferngstiyy l+isf qafllgdt++lptvhg vem+ 
258 eagfnggpygdlfvimjvlpskqfermgstiyynldisftqaalgdtveiptvhgdwma 317 

301 ipagtqtgktfrlrgkgapkmgggqgdqhvtvnivtptkiindaqkealhafaeasgdkm 360 
lpagtqtgktbtll+grga^kmggg<x3dqhvtvniotptki^^ afaeasg+km 

313 IPAGTQTGKTFRLKGKtSMKmGGGQGDQHVTWIOTPTKIiHnaQREA^ 377 

361 VHPKKKGFFDKVKDALD 377 

+HPKKKiGFFDKVKDRL+ 
378 IiHPKKRGFFDKVKDaLE 394 



Based on this analysis, it was predicted that these proteins and their epitopes cotdd be useful antigens f 
vaccines or d 



Example 2229 

45 A DNA sequence (GBSx2348) was identified in S.agalactiae <SEQ ID 6883> which encodes the amino 
acid sequence <SEQ ID 6884>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transtneitibrane 281 - 297 ( 281 - 297) 

50 

Final Results 

bacterial membrane --- Certainty^O. 1086 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certaintp^O . 0000 (Not Clear) < suco 

55 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD24445 GB:AF118389 unknown [Streptococcus suis] 
Identities = 182/373 (48%) , Positives = 257/373 (68%) , Gaps = 5/373 (1%) 
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Query: 4 KVEEIRSYLIASIQNGKLAPGDRLPSIRQIANQFSCNKDTVQRVLMELRFDNYIYAKPRS 63 

K + I ++ 1+ + G++LPSIRQL Q+ C+KDTVQ+ ++EL++ N lYA +S 
Sbjct : 3 KYQVIIQDILTGIEEHRFKRGEICLPSIRQLREQYHCSKDTVQKflMLELKYQNKIYAVEKS 62 ' ' ' 

Query: 64 GYYVFDSHQEEVEEGVSLPNSEIAl^IAYDDFRLCLNETLIGREDYLFUYYYRQEGLLDLS 123 

GYY+ + + + + ++ I Y+DFR+CL E+LIGRE+YLF1IYY++QEGL +L 

Sbjct: 63 GyYII.EDRDPQ-DHTCRAQSYELSRITYEDFRICLKESLIGRENYLENYYHQQEGLAELI 121 

Query: 124 KAVZiKIMETGVYVP™iVITASTQQALFILT(m'FEl!reKSRVLIEEPTYP^ 183 

+V L+ + VY D +V1TAG+QQAL+ILTQ+ K+ +LIE PTY RMIELI+ 

Sbjct: 122 SSVQSLIMJYHVYTKraQLVITAGSQQALYILTQIffiTLAGKTEILIENPTYSRMIELIRH 131 

Query: 184 QNLPYETISEGTHGIDFQRLEEIFQTQSIKFFYVIPRMHNPIXSTSYNPVEMKRLIEMAEK 243 

Q +Py+TI R GID + LE IFQT IKFFY IPR+HNPLG++Y+ ++++A++ 
Sbjct: 182 QGIPYQTIERNLDGIDLEELESIFQTGKIKFFYTIPRLHNPLGSTYDIATKTAIVKLAKQ 241 

Query: 244 YDVYIVEDDYMSDFASQS- -PLHYYDTHGRVIYLKSFSKAIFPALRIiAAICLPQAlKSTF 301 

YDVYI+EDDY++DF S PLHY DT RVIY+KSF+ +FPALR+ AI LP L+ F 
Sbjct: 242 YDVYIIEDDYLADFDSSHSLPLHYLDTDNRVIYIKSFTPTLFPALRIGAISLPNQLRDIF 301 

Query: 302 MAYKKLMDYD™ijILQI<ALALYIENGLYAKNSQYLKYRyQKDLANSKSILADHP-NLPSY 360 

+ +K L+DYDTNLI+QKAIi+LYI+NG+H-A+N+Q+L + Y K L + N+P Y 

Sbjct: 302 IKHKSLIDYDTHLIMQKaLSLyIDNG^WAIa!^'QE^IJmIYHAQWNKIKDCLEKS^^ 360 

Query: 361 SLHHDSVLFDCSK 373 

+ SV F SK 
Sbjct: 361 RIPKGSVTFQLSK, 373 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6885> which encodes the amino acid 
sequence <SEQ ID 6886>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3043 (Affirmative) < suco 

bacterial menibrane — Certaiiity=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 176/382 (46%) , Positives = 255/382 (66%) , Gaps = 7/382 (1%) 

Query: 1 ^OTKVEEIRSYLIASIQNGKIJ^DRLPSIRQL2aJQFSCNKDTVQRVIl^lKrlRF^NYIYAK 60 

M TK + I S + IQ +L GD+LPSIR L-l- + C+KDTVQR L4-EL++ + lYA 
Sbjct: 1 MTTKYQTIISNIEQDIQKQRLKKGDKLPSIRVLSKVYYCSKDTVQRALLELKXRHLIYAV 60 

Query: 61 PRSGYYVFDSHQEEVEEGVSLENSEIANIAYDDFRLCUilETr.IGREDYLENYYYRQEGLL 120 

P+SGVYV + + ++L + N4AY+DFRLCLNE L ++ YLP+YY++ EGL 
Sbjct: 61 PKSGYYVL-QNVSMPDNV]aaSLEDYlm^aYEDFRLCl:^J^ 119 

Query: 121 DLSKAVAIMffiETGVYVPLDDIVITAGrQQRLFILTQOTFPNRKSRVLIEEPTYPRMIEL 180 

+L +A+ + E VY D ++IT+GTQQaL+IL-HQ+ FEN +L+E+PTY RM + 
Sbjct: 120 ELREALLLYIiRENSVYSNKDQLLITSGTQQALYILSQMPFPNTGKTILLEKPTYHRMEAI 179 

Query: 181 IKTQNLPYETISRGTHGIDFQRLEEIFQTQSIKFFYVIPRMHNPLGTSYIIPVEMKRLIEM 240 

+ LPY+TISR +G+D + LE +FQT IKFFY I R +PLG SY+ E + ++ + 
Sbjct: ISO VAQLGLPYQTISRHENGLDliELLESLFQTGDIKFFYTISRFSHPLGLSYSTKEKEAIVRL 239 

Query: 241 AEKyDVYIVEDDYMSDFA--SQSPLHYYDTHGRVIYLKSFSKAIFPALRLAAICLPQALK 298 

A++Y vyi+EDDY+ DF + P+HYYDTH R+IYLKSFS ++FPALR+ A+ LP LK 
Sbjct: 240 AQRYQVYILEDDYLGDFVKLKEPPIHYYDTHHRIIYLKSFSMSVFPALRIGALVLPSGLK 299 

Query: 299 STF^aYraCL^roYDaraILQKALALYIENGLYAKHSQXLIaRYQKDLaNSKSIIMH 358 

F+ K L+D DTOL++QKaLALY+ENG++ KN H-++K RY K ++ N P 

Sbjct: 300 PHFLTQKSLIDI£)TNLIMQKaLALYLENGMFQKNLRPIK-RYLKQRERQLALFLKiQ-NCP 357 
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Query: 359 S- -YSLHHDSVLFDCSKLDNFK 378 

Y L •++ D + D+++ 
Sbjct: 358 DIHYQLTPTHLVIDYTTSDSYR 379 '. ' 

SEQ ID 6884 (GBS423) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 7; MW 49.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 172 (lane 2; MW 74kDa). 

GBS423-GST was purified as shown in Figure 219, lane 2-3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2230 

A DNA sequence (GBSx2349) was identified in S.agalactiae <SEQ ID 6887> which encodes the amino 
acid sequence <SEQ ID 6888>. This protein is predicted to be pseudouridylate synthase I (truA). Analysis 
of this protein sequence reveals the following: 

D N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3265 (Affirmative) < suco 

20 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GF:BAB0388e GB:AP001507 tRNA pseudouridine synthase A 
25 (pseudotiridylate synthase I) [Bacillus halodurans] 

Identities = 105/240 (43%) , Positives = 147/240 (60%) , Gaps = 2/240 (0%) 

Query: 1 MTRYKAQISYDGSAFSGFQRQPNCRTVQEEIERTLKRIiNSGNDVIIHGAGRTDVGVHAYG 60 
M R +++YDG+ F+G+Q QPN RTVQ E+E LK ++ G + + +GRTD GVHA G 
30 Sbjct: 1 MKRIGLKVAYDGTDFAGYQIQPNERTVQGEIiESVLKNIHKGMSIRVTASGRTDTGVHARG 60 

Query: 61 QVIHFDLPQARDVEKLRFGLDTQCPDDIDIVKVEQVSDDEHCRYDKHIKTYEFLVDIGRP 120 

Q++HFD + V++ L++Q P DI +++ V DFH RY K Y + V 
Sbjct; 61 QIVHFDTSLSPPVDRWPIAMJSQLPADICVLEAADVPADEHaRYSAKTKEYRYRVLTSAQ 120 

35 

Query: 121 KMEWMRWYATHYPYPVIIEIJIQE&IKDLVGTHDFTGFTASGTSVENKVRTIE^^ 180 

■+ RNY H YP+ +E MQ A L+GTHDF+ F A+ VE+KVRTI D + E 
Sbjct: 121 ADVFREim"YHVRYPIiDVE2iMQRAAVQLI<3THDPSSFCSiAK24EVEDKVRTIE^ 180 

40 Query: 181 SKia.LIFTFTONGFLYKQVEHMVGTLIJCimGRMPISQIKTILQaiaiRDU^ 240 

+ LIF+ GlilGFLY VR +VGTLL+IG G+ ++ IL A++R+ AG TA G+GL 
Sbjct: 181 DE— LIFSIRGNGFLYMMVRIIVGTIiLEIGAGKRSAEEVAKILAaRSRERAGKTAPGHGL 238 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6889> which encodes the amino acid 

45 sequence <SEQ ID 6890>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2558 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 184/24S (73%) , Positives = 214/249 (85%) 



Query: 1 MTRYKAQISYDGSAFSGFQRQPNCRTVQEEIERTLKEIiHSGNDVIIHGAGRTDVGVHAYG 60 
M RYKA ISYDG+ FSGFQRQ + RTVQEEIE+TL +LN+G +IIHGAGRTD GVHAYG 
5 Sbjct: 1 MVRYKATISYDGTLFSGFQRQRHLRTVQEEIEKTLYKMNGTEailHGAGRTDAGVHAYG 60 



Query: 


61 


QVIHFDLPQaRDVEKLRFGLDTQCPDDIDIVKVEQVSDDFHC3RYDKHIICrYEFLVDIGRP 


120 






QVIHFDLPQ ++VEKI1RF LDTQ P+DID+V +E+V+DDPHCRY KH+KTYEPLVD GRP 




Sbjct: 


61 


aVIHFDLPQEQEVEKIiREaLr)TQTPEDIIWVNIEKVM)DFHCRYQKHLKTyEFLV^ 






121 


KNPMMRNYATHYFYFVIIELMQEAIKDLVGTEroFTGFTASGTSVENKVRTIFDAKIQFEA 


180 






KKPMMR+Y THYPy + I+LMQEAI HiVGTHDFTGFTA+GTSV+NKVRTI A + + 




Sbjct: 


121 


KKPRIMRHXTTHYPYTIMrKLMQEAINeLTOTHDFTGFTAAGTSVCJ^^ 


180 


Query: 


181 


SKbaLIETFTGNGFLYKQVraSIMVGTLLKIGiraRMPISQIKTILQAK^ 


240 






+ L+FTF+GNGFLYKjQVRISIMVGTLLKIGNG+MP+ Q+K IL H-KNR LAGPT +GNGL 




Sbjct: 


181 


KTDFLVFTFSGNGFLYKQVKNMVGTLLKIGNGQMPVEQV^ 


240 


Query: 


241 


YLKEIIYED 249 








YLKEI YE+ 




Sbjct: 


241 


YLKEICYEN 249 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 2231 

A DNA sequence (GBSx2350) was identified in S.agalactiae <SEQ ID 6891> which encodes the amino 

acid sequence <SEQ ID 6892>. This protein is predicted to be phosphomethypyrimidine kinase (thiD). 

Analysis of this protein sequence reveals the foUowmg: 

Possible site: 45 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=o . 2051 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15828 GB:Z99123 phosphcmethylpyrimidine kinase [Bacillus subtilis] 

Identities = 95/253 (37%) , Positives = 150/253 (58%) , Gaps = 13/253 (5%) 

Query: 1 MKTRNVLAISGNDIFSGGGLHADLATYWNKLHGFVAVTCLTAMSDKG---FEVIPIEAS 57 

M L I+G+D G G+ ADL T+ ++0 A+T + AM +V PI+ 

Sbjct: 1 MSMHKALTIAGSDSSGGAGIQADLKTFQEKNVYGMTALTVIVAMDPNNSWNHQVFPIDTD 60 

Query: 58 ILKQQLESLKD-VEFGSIKLGLLENVETAQWLEFVKSKQECPWLDPVLVCKENHDL- - 114 

++ QL ++ D + ++K G+LP V+ ++ + +K KQ W+DPV+VCK +++ 
Sbjct: 61 TIRftQIJ^TITDGIGVDAMKTGMLPTVDlIELAAKTIKEKQLKNVVIDPVWCKBA^ 120 

Query: 115 --EVSQLREQLIAFFPYADVITPHEiVEAQLLTGLS-IENLDQMKIAaEKLYDMGaKHVVI 171 

LREQL P A VITPWL Eft L+G+ ++ +D M AA+K++ +C3A++WI 
Sbjct: 121 PEHAQRLEEQIA---PLRTVITElSmFEaSQLSGiroELKTVDDMIEAAKKIHALGa^ 177 

Query: 172 KBGNRIliaEEATDLYYIX3ERFETWFPVVDaHOT-GftSCTFASSIASQIi^ 230 

GG +L E+A D+ YDGK E ++D T GAGCTF++++ ++IjA G V++A+ 

Sbjct: 178 TGGGKLKHEKAVDVLTO3ETAEVLESEMIDTPYTHG&GCTFSAAVTAELAKGAEVKEAIY 237 

Query: 231 MSKBFVYQAIKAS 243 

+K F+' AIK S 
Sbjct: 238 AAKEFITAAIKES 250 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 4407> which encodes the amino acid 
sequence <SEQ ID 4408>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certaintyi=0. 2029 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/252 (53%) , Positives = 174/252 (68%) 

Query: 1 MKirara:aiSGNDIFSGGGLffiUDIiATYVVNKIHGFV^ 60 

MKT ++ ISGNDI SGGGL+ADIATY+ L FVAVTCLT S++GF + P+ I + 
Sbjct: 1 MKTDYIVTISGIIDILSGGGLYADLATYIRYDLQAFVAVTCLTTRSEEGFSLFPV?yCEIFR 60 

Query: 61 QQLESLKDVEFGSIKLGLLPNVETAQWLEFVKSKQECPWLDPVLVCKENHDLEVSQLR 120 

QL S + +IK+GLI.PN E 4+VL+F+K PWLDPVL CKE D+++ LR 

Sbjct: 61 DQLNSFTNAPISAlKIGLLPNAEMCEIVLDFIKGHLGIPi/VLDPVLACKEIDDVKIVPLR 120 

Query: 121 EQLIAFFPYflDVITPHLVEAQIi,TGLSIENLDCMKIAAEKLYDMGAKHVVIKGGNRLNflE 180 

++++ Py V+TPNI.VEAQLL+ I +L M+ AA+ Y +GAK WIKGGNR + + 
Sbjct: 121 QEILQI^PYVTVVTEISnJVESiQU^SQKEIVSLKDMQEaaKYFYQLGaKQV^ 180 

Query: 181 EATDLYYDGERFETYVFPWDaNNTGMCTFAjSSIASQIAMGKN^ 240 

+A DL+YDG+ T PV++ NN GaGCTFASSIASQL K +AVK SK VXQM 
Sbjct: 181 KAIDLFYIMKEIVTLECPVLEKNNIGAGCTFJ^SIASQLVKKKTPLBAVKNSKELWQA^ 240 

Query: 241 KASDKYGWQHF 252 

SD+YGV Q + 
Sbjct: 241 LQSDRYGVKQSY 252 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2232 

A DNA sequence (GBSx2351) was identified in S.agalactiae <SEQ ID 6893> which encodes the amino 
acid sequence <SEQ ID 6894>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.05 Transmembrane 97 - 113 ( 96 - 119) 
INTEGRAL Likelihood = -0.22 Transmembrane 54 - 70 ( 54 - 70) 

Final Results 

bacterial membrane Certainty=0. 3421 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm CertaintyfeO . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA30952 GB:AP000007 202aa long hypothetical protein [Pyrococcus 
horikoshii] 

Identities = 48/148 (32%) , Positives = 73/14S (52%) , Gaps = 9/148 (6%) 

Query: 10 VQLAIVTAISIVLGMFISIPTPTGFLTLLDAGIFFAAFYFGKKEC3AWGALAGFLIDLLK 69 ' 

V A+VTA+++V+ I IP G+L D I + FG G G + DLL 
Sbjct: 49 VMAALVTAMTMVIR--IPIPASCGYLNFGDIMIKLTSVLFGPLVGGFAGGVGSAPADLL- 105 
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Sbjct: .106 GYPSWALFTLVIKGTEGIIVGYFSKGEANYGKILLGTVLGGSVMVIGYVSVaYVLYGPAG 165 

Query: 124 VLPDIPGNIMQIMVGMVVGFAIiNKSLKR 151 
+ ++ +I+Q + G+V+G L L++ . 
5 Sbjct: 166 AIGELYNDIVQAVSGIVIGGGLGYILKK 193 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6895> which encodes the amino acid 
sequence <SEQ ID 6896>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

10 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.62 Transmembrane 98 - 114 ( 97 - 119) 
INTEGRAL Likelihood - -0.00 Transmembrane 135 - 151 ( 135 - 151) 

15 Final Results 

bacterial membrane Certainty=0. 2848 (Affirmative) < suco 

bacterial outside CertaintY=0. GOOD (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB49310 OB:AJ248284 hypothetical protein [Pyroooccus abyssi] 
Identities = 42/145 (28%) , Positives = 73/145 (49%) , Gaps = 10/145 (6%) 

Query: 7 RC3MSLTSILTALVWLGRFVMLPTPT--GFLTLLDAGryAVSFSPGSAQGAIVGGLSGPL 64 

R ++++ + ALV + + +P P G+L D I V+ FG G GG+ + 
Sbjct: 39 RTVAISAVAAALVTAMTlWIRIPIPASQGYLNFGDIMimVAVLFGPLVGGFAGGVGSAI 98 

Query: 65 IDLVAGYEQWMFHSLIAHSVQGYFAGWRGR KRWLGWIGSFIMIFWYFLGSLML 118 

DL+ GYP W +r,I +G G+ + K +G V+G FIM+ Y S +L 

Sbjct: 9S ADLI-GYPSWALFTLIIKGSEGLWGYPSKGEPNYSKILIGTVLGGFIMVLGYVSVSYVL 157 

Query: 119 GYGLSGSLAGIWGNVMQNTLGLFVG 143 

YG +G+++ ++ + +Q G+ +G 
Sbjct: 158 -YGPAGAISELYHDTVQAVSGIVIG 181 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/155 (49%), Positives = 106/155 (67%), Gaps = 1/155 (0%) 

Query: 1 MRKEKTSQLVQLAIVTAISIYLGMFISIPTPT3FLTLLDAGIFFAAFYFGKKEGAWGAL 60 

M+ K Q+ I+TA+ +VLG F+ +PTPTGFLTLLDAGI+ +F FG +GA+VG L 
Sbjct: 1 MQNSKIRQMSLTGILTALWVLGRFVMLPTPTGPLTLLDAGIYAVSFSFGSAQGAIVGGL 60 

Query: 61 AGFLIDLLKGYPNWMFFSLLIHGTQGYLAGLPGRRRLWLISATLVMVLGYAIASGLM-Y 119 

+GFLIDL+ GYP WMF SL+ H QGY AG Oi+R LG++ + +M+ Y + S ++ Y 
Sbjct: 61 SGFLIDLVAGYPQWMFHSLIAHSVQGYFAGWRGRKRWLGWIGSFIMIFWYFLGSLMLGY 120 



Query: 120 GMC3A^^^PDIPGKIMQMTOG^WVGFAIJSIKSLBRVKK 154 

G L I GK+MQN +G+ VGF + K++ R KK 
Sbjct: 121 GLSG8LAGIWGNVMQNTLGLFVGFIIFKAILRQKK 155 

50 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2233 

A DNA sequence (GBSx2352) was identified in S.agalactiae <SEQ ID 6897> which encodes the amino 
55 acid sequence <SEQ ED 6898>. Analysis of this protein sequence reveals the following: 

Possible site; 43 

>» Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm — Certalnty=0 . 0381 (Affirmative) < succ; 

bacterial membrane Certainty=C . 00 0 0 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < succ> 



5 The protein has homology with the following sequences in the GENPEPT database. 



Query; 6 NKLKQETKAIWDIIERSALKKGQIFVLGLSSSEVSGCSLIGKNSSSEIGEIIA/EVILKEL 65 

N+LKQ K ++ + +++ LK+ Q+FVLG S+SEV+G IG + S +1 E I + + 
Sbjct: 2 NELKQTWKTMLSEFQDQAELKQDQLFVLGCSTSEVAGSRIGTSGSVDIAESIYSGIiAELR 61 

Query: 66 HSRGIYLAVQGCEHVNRALVVEaEL?ffiRQQI£VVtm^NLHAGGSGQTOU\FKLMTSFTO 125 

GI+IiA Q CEH+NRALWEaE A+ +L V+ VP AGG+ AFK M SPV V 
Sbjct: 62 EKTGIHIiAPQCCEHLNRALVVEAETAKLFRLPTVSAVPVPKJiGGAMMYAi^ 121 



Query: 126 EEXVAHAGIDIGDTSIGiraiKRVQVPLIPISRELGGBUIVTMiaSRPKLIGGaRliGY 181 

E I A AGIDIGDT IGMH+K V VP+ LG AHVT +RPKLIGG RA Y 

Sbjct: 122 ETIQRmGIDIGDTFIGim.KPVAVPVRVSQHSIfiSaHVTIiaRTRPKLIGGVRAVY 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6899> which encodes the amino acid 
sequence <SEQ K) 6900>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty^O .2166 (Affirmative) < suco 

bacterial membrane — Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

it of the GAS and GBS proteins is shown below. 

Identities = 132/183 (72%) , Positives = 161/183 (87%) 



Query: 6 NKLKQETKAIWDIIERSALKKGQIFVLGLSSSEVSGGLIGKNSSSEIGEIIVEVILKEL 65 
35 N L+++T+ IV-t.D++ERSA++ G +FVLGLSSSE+ G IGK SS b;+G+I+VEV+L EL 

Sbjct: 3 NNLEKQTREIVIDWERSAIQPGNLFVLGLSSSEILGSRIGKQSSLEVGQIWEWLDEL 62 

Query: 66 HSRGIYIAVQGCEHVNRALWESffiLAERQQLEVVlWVPiniHAGGSGQVAAFKLMTSPVEV 125 
+ RG++LAVQGCEHVNRALWE +AE +QLE+VNWPNLHAGGS Q+AAF+LM+ PVEV 
40 Sbjct: 63 NKRGVHIiAVQGCEHVMJALVVERHVAESKQIjEIVNWEimHAGaSAQMAAFQLMSD^ 122 

Query: 126 EEIVAHAGIDIGDTSIGtffilKRVQVPLIPISRELGQAHVTALaSRPKLIGGARAGYTSDP 185 

EE++AHAG+DIGDT+IGMHIKRVQ+PLIP RELGGAHVTRL&SRPKLIGGARA Y D 
Sbjct: 123 EEVIAHAGLDIGDTAIGMHIKRVQIPLIPCQEELGGAHVTALASRPKLIGGARADYNMDI 182 

45 

Query: 186 IRK 188 
IRK 

Sbjct: 183 IRK 185 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 



Example 2234 

A DNA sequence (GBSx2353) was identified in S.agalactiae <SEQ ID 690 1> which encodes the amino 
acid sequence <SEQ ID 6902>. Analysis of this proteia sequence reveals the following: 

55 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.25 Transmembrane 21 - 37 ( 13 - 46) 
INTEGRAL Likeliiiood = -4.30 Transmembrane 78 - 94 ( 76 - 113) 
INTEGRAL Likelihood = -2.07 Transmembrane 96 - 112 ( 95 - 113) 
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Final. Results 

bacterial meinbrane - — Certainty=0.5501.(Affirniative) < succ 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm . Certaiiity=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



+L+I G+ S+L+AGAfil G+AIG GAQG +SD+V 



G V G V VG+RT I FDGTm+IENBNI VSN SR B 





12 


Sb j ct : 


7 




70 


Sbjct: 


67 




130 


Sbjct: 


127 




190 


Sbjct: 


187 




250 


Sbjct: 


247 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6903> which encodes the amino acid 
sequence <SEQ ID 6904>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.49 Transmembrane 24 - , 40 ( 15 - 45) 
Likelihood = -4.83 Transmembrane 78 - 94 ( 73 - 99) 
Likelihood = -2.07 Transmembrane 96 - 112 ( 95 - 113) 

Final Results 

bacterial membrane Certainty=0 .4397 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) <: suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB06385 GB:AP001516 unknown conserved pixatein [Bacillus halodurans] 
Identities = 104/249 (41%) , Positives = 151/249 (59%) , Gaps = 4/249 (1%) 

KKLVSLIILLLFFAILKRVTNYLFEKTINKSFAYSRQSEARKKTLSKLTHNILNYLLYFL 81 
K LV++I L+AIKR++F+ + +SRTLKL+N+YLF+ 
KVLVAVIAFLIVRAIGKRIISNSERRMAKNN QLSSGRWTLEKLSLNAFSYTLMFI 78 





22 


Sbjct: 


23 


Query: 


82 


Sbjct: 


79 




142 


Sbjct: 


139 




202 


Sbjct: 


199 



+L++FG+ S+L+AGAGI G+AIG G&(3G +SD+V GFFIL E Q +VGD VT 



++G V VG+RT IRGFDGTLH+IENR+I VSN SRGNMRAL++I 4 



262 LLKEGIQLP 270 
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An alignment of the GAS and GBS proteins is shown below. 
Identities = 164/265 (61%) , Positives = 215/265 (80%) 

Query: 7 FIDHLNVEEVLFTFFTKLISILLLIIAFVIVRQVIISnfLFEKTVNRSLAFSRQKVARQKTL 66 

+++ ++E + T F KL+S+++L++ F I+++V NYLFEKT+N+S A+SRQ AR+KTL 
Sbjct: 7 YLEQSHIENIGLTIFKKLVSLIILLLFFAILKRVTNYLFEKTINKSFAYSRQSEaRKKTL 66 

Query: 67 AiajSmra^TLYFFLFYWILSILGVPISSLLRGAGIAGVAIGLGAQGFLSDVVNGFFIIi 126 

+KL+HN+LNY LYF L YWILS+ G+P+SSLLAGAGIAGVAIGLGAQGFLSDWNGFFIL 
Sbjct: 67 SKliTHNIL^ra■LYFLLIYWILSLFGIWSSLLAGaGIAGVAIGrlGAQGFLSDVVNaFFIL 126 

Query: 127 LENQFDVGDIINVGWSGTVT»rVGIRTTQIHDEDGTLHFIPNRNITIVS]\r^ 186 
ENQF+VGD + + + G+V VGIRTTQI FDGTLHFIKNR+IT+VSNKSR NMRA 1 

Query: 187 DIPLFVHTmDQISDIVTKIlffiEYVSiaiPAIVGEPTVrGPTTNAIKSQFVTO 246 

+IPL+ NL Q++ 1+ ++N++ + HP IVB+P + GP N+NGQF +RI IFT+NG 
Sbjct: 187 EIPLYSTVNLSQVTRIIDKTOQKELPNHPQIVGKENIIKSPQNNSNGQFTFRIAIFTEMGE 246 

Query: 247 QFDIYAEFYKLYQKAILEEGIDLPT 271 

QF lY FY+LYQ+A+L+EGI LPT 
Sbjct: 247 QFKIYHTFYRLYQEAIiLKEGIQLPT 271 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2235 

A DNA sequence (GBSx2354) was identified in S.agalactiae <SEQ ID 6905> which encodes the amino 
acid sequence <SEQ ID 6906>. This protein is predicted to be RopA (tig). Analysis of this protein sequence 
reveals the following: 

Possible site: 20 

»> Seems to have no N-tennlnal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1785 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9283> which encodes amino acid sequence <SEQ ID 9284> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6907> which encodes the amino acid 
sequence <SEQ ID 6908>. Analysis of this protem sequence reveals the foUovraqig: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — CertaintysO . 0776 (Affirmative) < suco 

bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 303/354 (85%) , Positives = 337/354 (94%) 
Query: 1 MSTSFENKRTNRGIITFTISQDEIKPALDQAENKVKKDIjNVPGFRKGHMPRWENQKF^^ 60 



wo 02/34771 



PCT/GBOl/04789 



MSTSFENKATNRGh-ITFTISQD+IKPALD+AFWK+KKDIiN pgfrkghmpr vfnqkfge 
Sbjct: 30 MSTSFENKATNRGVITFTISQDKIKPMiDKRFNKIKKDIJmPGFRKGHMPRPVFSra 89 

Query: 61 EMjYENAITOVLPKAYEftAVAELGLDVVaQPKIDWSIffiKGQISJKLTAEVVTKPE^ 120 

E LYE+AEN+VLP+AYEAAV ELGLDVVftQPKlDWSMSKG++W L+flEWTKPEVKLGD 
Sbj&t: 90 EVLVEDMJ^IVLPEAYEftAVTELGLDVVRQPKIDWSlffiKGKEIWLSRE^ 149 

Query: 121 YKDLSVEVDASKEVSDEE\mKVERERNmjaELTVKDGEARQG[yrW 180 

YK+L VEVDASKEVSDE+VHAK+ERER NLftEL +KDGEAAQGDTWIDFVGSVDGVEPD 
Sbjct: 150 YKNLVVEVDASKEVSDEtroaraERERQISnaELIIKDGBAAQGDTWIDFVGSVroVEF^ 209 

Query: 181 GGKGDNFSLELGSGQFIPGFEEQLVGSKAGQTVDVNVTFPEDYQAEDIAGKDAKFVTTIH 240 

GGKGDNFSLELGSGQFIPGFE+QLVG+KAG V+VNVTFPE YQAEDLAGK AKF+TTIH 
Sbjct: 210 GGKGDNFSLELGSGQFIPGFEDQLVGAKAGDEVEVNVTFPESYQAEDIAGKAAKFMTTIH 269 

Query: 241 EVKTKEVPALDDELAKDIDDEVETLDELKS.KYEKELESAKEIAFDDAVEGAAIELRVANA 300 

EVtCTKEVP LDDELAKDID++V+TL++IiK KYRKELE+A+E A+DDAVEGAAIELAVANA 
Sbjct: 270 EVKTKEVPKLDDELAKDIDEDVDTLEDljKVKYRKELEAaQETAYDDAVEGAAIELAVANA 329 

Query: 301 EIVELPEEMVHDEVHRftMNEFMCSNMQRQGISPEMYFQLTGTTEEDLHKQYQADA 354 

EIV+LPEEM+H+EV+R++KEFMGNMQRQGISPEMYFQLTGXT+EDI1H QY A+A 
Sbjct: 330 EIVDLPEEMIHEEVNRSVNEFMGNMQRQGISPEMYFQLTGTTQEDLHNQYSAEA 383 

Based on this analysis, it was predicted that these protems and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2236 

A DNA sequence (GBSx2355) was identified in S.agalactiae <SEQ ID 6909> which encodes the amino 
acid sequence <SEQ ID 6910>. This protein is predicted to be galactose-6-phosphate isomerase laca subunit 
(rpiB). Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N- terminal signal sequence 

Final ReGults 

bacterial cytoplasm Certainty=0. 3491 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside C6rtainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25177 GB:M60447 galactose 6-P isomerase [Lactococcus lactis] 
. Identities = 92/141 (65%) , Positives = 115/141 (81%) 

Query: 1 MTIIIGADMGVELKEVIRQHLTSLGKElIDLTDTSKDFVIMIAIVAKOTIQKEDNIfi 60 

M I++GAD 6 LK+V++ L G E+ID+T +DPVD TLA+ ++VN+ E NLGI+ 
Sbjct: 1 MAIVVGADLKGTRLKDWKNPLVEEGFEVIDVTKDGQDFVIOTTAVASEVNKnEQK^ 60 

Query: 61 VDAYGTOPEMVATKVKBMIAAEVSDERSLAYOTRAHNNiUMITrfiSEIVGPG^ 120 

+DAYG GPFMVATK+KGM+AAEVSDERSAYMTR HNNARMIT+6+EIVB +AK+I + F 
SDjct: 61 IDAYGAGPFMVATKIKG^TOaAEVSDERSAYMTRGHNNARMITVQaEIVQDELaKNIAKAP 120 

Query: 121 VDGTYDAGRHQIRVDMENKMC 141 

V+G YD GRHQ+RVDMimCMC 
Sbjct; 121 VNGKYDGGRHQVRVDMLNKMC 141 

A related DNA sequence was identified in S.pyogenes <SEQ ID 691 1> which encodes the amino acid 
sequence <SEQ ID 6912>. Analysis of this protein sequence reveals the following: 



Possible site: 45 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0. 3224 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaiiity=0. 0000 (Hot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/140 (72%) , Positives = 117/140 (83%) 

Query: 1 MTIlIGaDAHSVELKEVIRQHLTSIXSKEIIDLTDTSKDFVDlWLAIVAKVNQKEDN^^ 60 

M II+GM5AHG LKE+I4. L G +IID+TD ■(- DF+DNTIA+ VN+ K IfilM 
Sbjct: 1 ^4AIILGM)AHGNALKELIKSFLQEEGYDIIDVTDINSDFIDl<^TLAVAKAVNEAEGRIlGIM SO 

Query: 61 VDAYGVGPFMVATKVKGMIA^iEVSDERSAYMTHAiDJNARMITLGSEIVGPGVAKHIVEGF 120 

VDAYG GPFMVATK+KGM+AAEVSDERSAYMTR HNNARMIT+G+EIVGP +AK+IV+GF 
Sbjct: 61 VmYGAGPFMVATKLKG^TOAEVSDEftSA™TRGHNNaImITI(3aEIVGPE^LfiKN^ 120 

Query: 121 VDGTYDAGRHQIRVDMLNKM 140 

V G YD GRHQIRVDMIiNKM 
Sbjct: 121 VTGPYDGGRHQIRVDMLNKM 140 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2237 

A DNA sequence (GBSx2356) was identified in S.agalactiae <SEQ ID 6913> which encodes the amino 
acid sequence <SEQ ID 6914>. This protein is predicted to be galactose-6-phosphate isomerase lacb 
subunit (rpiB). Analysis of this proteia sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2511 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=C. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10189> which encodes amino acid sequence <SEQ ID 
101 90> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25178 GB:M60447 galactose S-P isomerase [LactococcuB lactis] 
Identities = 138/171 (80%) , Positives = 157/171 (91%) 

Query: 10 IffilAVGCDHIVTSDKIAVVDYLKTKGYEVIDCGTYDNIRTHyPIYGKKVGEAVASGKADL 69 

M+IA+GCDHIVT K+AV ++LK+KGYEV+D GTXD++RTHyPIYGKKVGEAV S6+ADL 
Sbjct: 1 miAIGCDHIOTDVKMAVSEFLKSKBYEVLDFGTYDHVRTHYPIYGKKVGEAVVSGQaDL 60 

Query: 70 GVCICGTGVGINNRVNKVPGIRaaLVRDLTSAITiaKEEiaffiNVIGPaGKITaGLLMTDII 129 

GVCICGTGTOimiaTOKVPG+RSRLVRD+TSA+YAKEEIJiffi^ ITGGLLM DII 

Sbjct: 61 GVCICGTGVGimiAVtTOTOTOSRLVEDmSALYAKEELNAWIGFGGMITaGLL^ 120 

Query: 130 EAFIRAKXKPTKENKVLIEKIAEVETHNAHQSSNDFFTEFLDKWNRGEYHD 180 

EAFI A+YKPT+ENK LI KI VETHNAHQ + +FFTEFL+KW+RGEYHD 
Sbjct: 121 EAFIEAEYKPTEENKKLIAKIEHVETHNAHQADEEFFTEFLEKWDRGEVHD 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 691 5> which encodes the amino acid 
sequence <SEQ ID 6916>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm — Certaiiity=0. 3048 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty>»0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 136/171 (79%) , Positives = 160/171 (93%) 

Query: 10 MKlAVGCDHIVTYDKIAVVDYLKTKGYEVIDCGTYDNIRTHIPIYGKKVGERVaSGKRDL 69 

MKIA+GCDHIVT +K+AV D+LK4-KGY+VIDCGTYD+ RTHYPI+GKKVGEAV 4G+ADL 
Sbjct: 2 MKIAIGCDHIVTNEKMAVSDFLKSKGYDVIDCGTYDHTRTHYPIFGKKVGEAWNGQADL 61 

Query: 70 GVCICGTGVGINNAVNiaTGIRSALVRDLTSAIYAKEELNANVIGFGGKITGGLI.MTDII 129 

GVCICGTGVGINNAVNKV'PGIRSALVRD+T+A+YAKEEMANVIGFGGKITG LLM DII 
Sbjct: 62 GVCICGTGVGINNAVNK\'PGIRSALVRDMTTALYAKEELNANVIGFGGKITGEr.LMCDII 121 

Query: 130 EAPIRAKYKPTKENKVLIEKIAEVETHNaHQEENDFFTEFLDKWNRGEYHD 180 

+AfI+A+YK T+ENK LI KIA +E+H+A+QE+ DFFTEFL+KW+RGEYHD 
Sbjct: 122 DAFIKaEYKETEENKKLIAKIAHLESEfflANQEDPDFFTBFnEKWDRGEYHD 172 

Based on this analysis, it was predicted that these proteins and their epitopes could be useilil antigens for 



Example 2238 

A DNA sequence (GBSx2357) was identified in S.agalactiae <SEQ ID 6917> which encodes the amino 
acid sequence <SEQ ID 691 8>. Analysis of this protein sequence reveals the following: 

n signal seq 

Pinal Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Wot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10187> which encodes amino acid sequence <SEQ ID 
101 88> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 11 MILTVTLNPSIDISYCLENFNITOTVlTOVTDVSKTPGGKaii^ 70 

MILTVTiaSIPS+DISY LE +m™iRV DVSKT GGKGRJVTRVIi + GD V ATG LG 

Sbjct: 1 MILTVTLNPSVDISYPLETLKIDTVNRVroVSKTAGGKjGUSIVTRVLYESGD^ 60 

Query: 71 GDFGDFIRSGLDALEIRHQFLSIGGETRHCIAVLHEGQQTEILEKGPHITKDEADAFLKH 130 

G G+FI S L+ + FIG TR+CIA+LHEG QTEILE+GP 1+ +EA+ FL+H 

Sbjct: 61 GKIGEFIESELEQSPVSPAFYKISCaWKNCIAILHEGNQTEILEQGPTISHEEAEGFLDH 120 

Query: 131 LKLIFDAATIITVSGSLPKGLPSDYYARLISLANHFNKKVVLDCSGEAliRSVLKSSAKPT 190 

+ + ++T+SGSLP GLP+DYY +LI WLDCSG L +VLKSSAKPT 

Sbjct: 121 YSlSnijIKQSEVVTISGSLPSGLPNDYYEKLIQIASDEGVAVVIiJCSGAPLETVLKSSAKPT 180 

Query: 191 VIKPNLEELTQLIGKPISYSLDELKSTLQQDLFRGIDWIVSLGARGAFAKHGNHYYQVT 250 

IKEN EEL+QL+GK ++ t+ELK L++ LF GI+W++VSI1G GAFAKHG+ +y+V 

Sbjct:. 181 AIKPNMEELSQIiIjGKEOTKDIEELKiraiKESLFSGIEWIWSMRMGRF^ 240 

Query: 251 IPKIBVINPVGSanATVA6IASALEHQLDDTNLLKRAN\7LGMIiNaQETLTGHI]!tt 310 

IP I V+NPVGSGD+TVAGIASAL + D +IiI.K A LGMIiNAQET+TGH+N+T Y+ 

Elbjot: 241 IPDIPVWPVGSGDSTVAGIASAIJffiKKSnaDIiKHA^rafiMIl^IAQETlyITGOT 300 
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Query: 311 LISQIQVKEV 320 

L SQI VKEV 
Sbjct: 301 LNSQIGVKEV 310 

A related DNA sequence was identified in S.pyogenes <SEQ ID 691 9> which encodes the amino acid 
sequence <SEQ ID 6920>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm CertaintyisO . 1178 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) « suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 184/310 (59%) , Positives = 232/310 (74%) , Gaps = 1/310 (0%) 

Query: 11 MILTVTIOTSIDISYCLENBTSimTVmVTDVSKTPGGKBUmK 70 

+ILTVTIJ!IP+ID+SY L-1- DTVMRV DV+RrEC3GKSLNV+RVL++ G+ V ATG +G 
Sbjct: 1 VILTVT]OT>AIDVSYPLDELKCETVNRVVDVTKTPGGKGIOTSRVENEFGETV^ 60 



Query: 131 LKLIPDAATIITVSGSLPKGLPSDYYARLISIiiNHFNKKWLDCSGEALRSVLKSSAKPT 190 

K + + ++T+SGSLP G+P DYY +LI +AN KK VLDCSG AL +VLK +KPT 
Sbjct: 120 FKYLLiNDVDWTLSGSLPAGMPDDYYQKliIKIANLNGKKTVLDCSGNALEAVLKGDSKPT 179 

Query: 191 VIKPNLEELTQLIGKPISYSIiDELKSTLQQDLFRGIDWVIVSLGARGAFAKHQNHYyQVT 250 

VIKPNLEEL+QL+GK ++ D LK LQ l-LF GI+W+IVSLGA G FAKH + +Y V 
Sbjct: 180 VIKERIjEIEXSQLIiGKEWrKDFnALKEVLQDELFDGIEWIIVSUSAIXOT 239 

Query: 251 IPKIEVINFVBSGnAWAGIASaiiEHQIiDDTNIiiaUiNVIX^^ 310 

IPKI++++ VGSGD+TVAGIAS L + DD LL +ANVLGMLiKIRQE TGH+N+ Y + 
Sbjct: 240 IPKIKIVSAVGSGDSTVAGIASGLANDEDDRALLTKANVLGMLNAQEKTTGHV^^ 299 

Query: 311 LISQIQVKEV 320 

L I+VKEV 
Sbjct: 300 LYQSIKVKEV 309 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2239 

A DNA sequence (GBSx2358) was identified in S.agalactiae <SEQ ID 6921> which encodes the amino 
acid sequence <SEQ ID 6922>. This protein is predicted to be tagatose 1,6-diphosphatB aldolase. Analysis 
of this protein sequence reveals the following: 

0 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0369 (Affirmative) < su.cc 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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Identities = 253/325 (77%) , Positives = 295/325 (89%) ' 

Query: 1 MGLTEQKQKHMEQLSDKNGIISMAFDQRGaU^KRLMAKYQSEEPTVSQIE^iLKVLVAEEL SO 

M LTEQK+K +E+LSDKNG ISALAFDQRGALKRLMA+YQ EPTV+Q+E LKVLVA+EL 
Sbjct: 1 MVLTEQKRKSLEKLSDKMGFISAIAHDQRGMiKRLMJ^QDTEPTVAQiyiEELKVIiVADEL SO 

Query: 61 TPYASSMLIiDPEYGLPATKVLDDNAGLLLAYBKTGYDTSSTKRLPDCLDIWSAKRIKEE 120 

T YASSMLLDPEYGLPATK LD AGLLIiR+EKTGYDTSSTKRI.PDCLD+WSaKRIKE+G 
Sbjct: 61 TKYASSKILLDPEyGLPATKaLDKSMLIJlAPEKTGYDTSSTKRLPDCLDW^ 120 

Query: 121 ftiaVKFLLYYIMJSSDEVHEEKEAYIERIGSECVREDIPFFLEILSYDEKITDSSGIEYA 180 

ftDAVKFLLYYDVDSSDE+N++K+AYIER+GSECVaEDIPPFLEIL+YDE+I+D+ +EYA 
Sbjct: 121 ADAVKPLLYYDVDSSDELNQQKQAYIERVGSECVREDIPFFLEIIAYDEEISBAGSVEYA 180 

Query: 181 KIKPRKVIEftMKVFSNPRFNIDVLKVKVPVMMDYVEGFAQGETAYWKATAftAYFREQDQA 240 

K+KPRKVIEAMKVFS+PRFNIDVLKVEVPVN+ YVEGFA GE Y+KA AA +F+ Q++A 
Sbjct: 181 KVKPRKYIEAMKVFSDPRFNIDVLKVEVPVNVKYVESFADGEVVYSKAEAADFFKaQEEA 240 

Query: 241 TLLPYIFLSAGTOAQLFQETLOTAKEAfiAKFNGVLaSRATWAGSVKEYVEKGEAGaRQWL 300 

T LPYI+LSAGV A+LFQETL FA ++GAKPNGVLCX3RATWAGSV+ Y+++GE AR+WL 
Sbjct: 241 miPYIYLSAGVSMaFQETLQFAHDSGAKFMGVLCCSRATWAGSVEPYIKEGEK^^ 300 

Query: 301 RTIGFQNIDEIjNKILQKTATSWKBR 325 
RT GF+NIDELNK+L KTA+ W ++ 

Sbjct: 301 RTTGFEWIDERJKVLVKTASPWTDK 325 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6923> which encodes the amino acid 
sequence <SEQ ID 6924>. Analysis of this protein sequence reveals the following: 

Final Results 

bacterial cytoplasm Certainty=0 . 0600 (Affirmative) < suco 

bacterial tnembrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 230/323 (71%), Positives = 276/323 (85%), Gaps = 1/323 (0%) 

Query: 3 LTEQKQIOraEQLSDKNGIISALAFDQRGALKRLMAKYQSEEPTVSQIEALKVLVAEELTP 62 

LTE K+K ME+LS +G+ISAIAFDQRGALKR+MA++Q++EPTV QIE LK LV+EELTP 
Sbjct: 5 LTENKRKSMEKLS-VDGVISAIAFDQRGALKRIIMAQHQTKEPT^/EQIEELKSLVSEELTP 63 

Query: S3 YASSMLLDPEYGIiPATKVlDDNAGLLLAYEKTGYDTSSTKRLPDCLDIWSAKRIKEEGAD 122 

+ASS+LLDPEYGLPA++V + AGLLLAYEKTGYD ++T RliPDCLD+WSAKRIKE GA+ 
Sbjct: 64 FASSimDPEHGI.PASRVRSEEAGLLIAYEKTGYDA'rrTSRLPDCLDVWSAKRIKEAfiAE 123 

Query: 123 AVKFLLYYDVDSSDEVMEEKEAYIEKIGSECVAEDIPFFIiEILSYDEKITDSSGIEYRKI 182 

AVKPLLYYD+D +VME+K+AYIERIGSEC AEDIPF+LEIL+YDEKI D++ E+AK4- 
Sbjct: 124 AVKFLLYYDIDGDQDVNEQKKAYIERIGSECRAEDIPFYLEILTYDEKIADNASPEFAKV 183 

Query: 183 KPRKVIEUy»IKVFSMPRFHIDVLK^/EVPVim)YWGFAQGETAYNKATAAAY 242 

K KV EAMKVFS RF +DVLKVEVPVHM +VEGFA GE + K AA FR+Q+ +T 
Sbjct: 184 KAHICVmiyv]KVFSKERFGTOVLKVEVP\™K?VEGFADGEVLFTKEEAAQaFRDQEASTD 243 

Query: 243 LPYIFLSAGVPAQLFQETLVFAKEAGAKFNGVLCGRATWAGSVKEYVEKGEAGARQWLRT 302 

LPYI+LSAGV A+LFQ+TLVFA E+GAKFNGVLCGRATWAGSVK Y+E+G AR+WLRT 
Sbjct: 244 LPYIYLSAGVSAKLFQDTLVFAAESGAKFNGVLCGRRTWAGSVKVYIEEGPQAAREWLRT 303 

Query: 303 IGFQNIDELNKILQKTATSWKER 325 

GF+NIDEMK+L KTA4- W E+ 
Sbjct: 304 EGFKNIDELNKVLDKTASPMTEK 326 
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Based on this analysis, it was predicted that these proteins and their epitopes could be usefid antigens for 
vaccines or diagnostics. 

Example 2240 

A DNA sequence (GBSx2359) was identified in S.agalactiae <SEQ ID 6925> which encodes the amino 
acid sequence <SEQ ID 6926>. This protein is predicted to be lacx protein, chromosomal. Analysis of this 
protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal secpience 

Final Results 

bacterial cytoplasm Certainty=0 . 0543 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10185> which encodes amino acid sequence <SEQ ID 
10186> was also identified. 

The protein has homology witii the following sequences in the GENPEPT database. 



Query: 24 MAITIQNHELQVTLKALGRTl';TSITDSQGTOYL,WCGDATTi1GGQAPILFPICGS\7RNDCV 83 

M X ++N L V K LG +'.rSI D G+Eyi",WQ D YW GQAPILFPICC3S+RND 
Sbjct: 1 MTIELKNEYLTVQFKTLGSQLTSIKDKDGL3YLWQflDPEYWl>IGQAPILFPICGSLRNDMA 60 

Query: 84 lYRPAQAPHFTGIlPRHGFVRHKTFDYDYISDSSVRFTIKSSKEMLINYPYRFSLEITSn 143 

lYRP + P FTG+I RH6FVR + F + ++++SV F+IK + EML NY Y+F L + -XT 
Sbjct: 61 lYRPQERPFFTGLIiaiHGFVRKBEFTLBEVNENSVTFSIKENAEMIjiaraiYQFELRVV^ 120 



H V NIiE+EK MPY IGAHP ENCPL E E + DY LEF + E+C+IP+SF 



Query: 204 PDTGIJ^LQaRHPF]»IQRQLSIJSlHM.FEKimTLDQLRS 263 

P+TGLLDLQ R PFLENQK L L+++LF naiTLD+L+S++V L+SR KG+++DFDD 
Sbjct: 181 PETGLIiDLQDRTPFLENQKSLDLDYSLFSHDRITLDRIiKSRSVTIiRSRKSGKGLRVDFDD 240 

Query: 264 FEHLILITOSNNGGPFIALEPWSSLSTSIEESDILEDKQNIVRIiNPKQSRQHSIRITIL 321 

F raiILW++ N PF+MiEPWS LSTS+EE +ILEDK + ++ P + + S ITIL 
Sbjct: 241 FPNLILWSTXinCSPFIAl^PWSGLSTSLEEGNILEDKPQVTKVLPLDTSKKSYDITIL 298 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2241 

A DNA sequence (GBSx2361) was identified in S.agalactiae <SEQ ID 6927> which encodes the amino 
acid sequence <SEQ ID 6928>. This protein is predicted to be ABC transporter. Analysis of this protein 

sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3272 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . OOOQ (Not Clear) c suoo 

A related GBS nucleic acid sequence <SEQ ID 10183> which encodes amino acid seqiaence <SEQ ID 
1 01 84> was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA51350 GB:X72832 leucine rich protein [Streptococcus 
equisimilis] 

Identities = 101/278 (36%), Positives = 160/278 (57%), Gaps = 1/278 (0%) 

10 Query: 10 ^©FKEI,FPKVITKQEVKQSEDYIIVEQDGHVLHFPKSSLTKRELYIl]:aflTPSLE^ 69 

M+ K+ FPE+ ++++ V++ +HFPKS I)+++E LL++ + 

Sbjct: 1 MEIiKDYFPEMQVGPHPLGDKEWVSVKEGDQYVHFPKSCLSEKERLLLEVGLGQYEVLQ-P 59 

Query: 70 SQNPWYRYLVEGRGRLPQSHSAVQFIFIEHQFTLSEELKDFLSPLVINVETIMTINQTQS 129 
15 +PW RyL++ +G PQ QFI + + HQ L +L + L ++ +E 1+ 1+ TQ+ 

Sbjct: 60 LGSPWQRYLLDHQGNPPQLFETSQFIYLNHCJQVLPADLVELLQQMIAGLEVILPISTTQT 119 



25 





130 


VMILNQDNFENATELLTDILPTIElTOFNTRLRCYFGNSWTHLQAVDWKELYEEEyKLFTL 


189 






+ Q L +LPT+E+DF L + GN+W + A +E +EEE +L T 




Sbjct: 


120 


AFLOlQATSIKVIilSLEGIiPTLESDFGLALTMFVGNAWYQWAaGTLRECFEEK 


179 




190 


PLSHKJfflQHYOlFPKMALWaiJaiQSPMPSIKAKCLQHIUyrSDTSailK&LWQEQGNI^ 249 






+L K+ F ++ LW++ + P++ + Q + SD + ++ ALW E GNL + 




Sbjct: 


180 


YLKQKSGGKLLTFAEVMLWSILSHQSFPALTRQFHQFLNPQSDMADVVHRLWSEHGNLVQ 


239 




250 


TAKALPIHRNSIiQYKLDKFTQSSGIJSIjKIIinDLAYAYL 287 








TA+ L+IHRNSLQYKLDKF Q SGL+LK LDDLA+AYL 




Sb j ct : 


240 


TAQRLYIHRWSLQYKLDKFAQQSGLHLKQLDDLAFAYL 277 





30 A related DNA sequence was identified in S.pyogenes <SEQ ID 6929> which encodes tiie amino acid 
sequence <SEQ ID 6930>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm --- Certainty=0 .4332 (Affirmative) < suco 

bacterial membrane Certaiiityi=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 106/287 (36%) , Positives = 169/287 (57%) , Gaps = 4/287 (1%) 

Query: 3 KTVVED-AMDFKELFPEVITKOEVKQSEDYIIVEQnGHVLHPPKSSLTKRELYLLQM-TP 60 
KTV++ AM+ K+ FPE+ +D++ +++ +HPPKS L+++E LL+H- 

45 Sbjct: 7 KTVMKBMRMELKDYFPEMQUGPHPLGDKDWMSIKEGDQYVHPPKSCLSEKERLLLEVGW 66 



Query: 61 SLEnASSVDSQNPWYRYLVEGRGRLPQSHSAVQFIPIEHQFTLSEEIiKDFLSPLVINVET 120 

E 1- S PW RYL++ +G PQ ■)• QFI++ HQ I. ++L + L ++ +E 
Sbjct: 67 QCEVLQPLGS--PWQRYLLDHQGNPPQIiYETSQFIYIiNHQQALPDDLVELLQQMIAGLEV 124 

Query: 121 IMTINQTQSVMimQDNFFmTELLTDILPTIENDENTRLRCyFGNSWTHLQAVDWKELY 180 

1+ 1+ TQ+ + Q L D+LPT+E+DF L + GN+W + A +E + 

Sbjct: 125 ILPISATQTAFLCRQAISIKWLRWLEDLLPTLESDFGLALTMPVGNAKKQ^fflAGTIJiEC^ 184 

Query: 181 EEEYKLFTLFLSHKAEQHYCRPPKMALWAIANQSPMPSIKAKCLQHIIiDTSDTSAIIKAL 240 

EEE +L T +L ++ + F + LW+L + ++ + Q + SD + ++ AL 

Sbjct: 185 EEECQLI.TAY^RQQSGRKIlLTFSGL^ttWSLIlSHmFIlaLTRQFHQFLSPQSD^lADVVHAIi 244 



Query: 241 VJQEQGOTJacrAKALFIHRNSLQYKLDKFTQSSGLNLKILDDLAYAYL 287 
60 WE GNL +TA+ L+IHRHSLQYKLDKP Q SGL+LK LDDLA+A+L 

Sbjct: 245 WSEHGNLVQTAQRLYIHRNSLQYKLDKFAQQSGIJHLKQLDDLAFAHL 291 
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Based on this analysis, it was predicted tiiat these proteins and flieir epitopes could be usefbl antigens for 

vaccines or diagnostics. 

Example 2242 

A DNA sequence (GBSx2362) was identified in S.agalactiae <SEQ ID 6931> which encodes the amino 
acid sequence <SEQ ID 6932>. This protein is predicted to be multiple sugar-binding transport ATP-binding 
protein msmk (malK). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4392 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

3 rautans] 



ELKIDGEVVNDK+PKDRDIflMVFQNYM.YPHM+\7yDNMaFGLKLR +SK+ IDKRV+EAA 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


(Juery: 


181 


Sbjct: 


181 


Query: 




Sbjct: 


241 


Query; 


301 


Sbjct: 


301 




361 


Sb j ct : 


361 



IH+RIG+TTIYVTHDQTEAMTLftDRIVIMS+TKN DG C3TIG++EQVG+PQEIiyN RANK 



SS+LLVQ+TYP+A V+AEV+VSELLGSETMLY+KLGQTEPA+RV+ARDFH PGEKV+LTF 



NVAKGHFFDA+TE AIR 
NVAKGHFFDAETEAAIR 377 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6933> which encodes the amino acid 
sequence <SEQ ID 6934>. Analysis of -this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4542 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 332/377 (88%), Positives = 359/377 (95%) 
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Query: 1 MVEMLNHIYKKYPSASHYSVEDFDLDIKDXEFIVFVGPSGCGKSTTLRMIAGLEDISEG 60 

MVELNLNHIYKKyP+ +hy+vedfdldikd:<efivfvgi?sgcgksttlrmiaglediseg 

Sbjct: 1 MVEIiNIiNHIYKKYPNTTHYAVEDFDIjDIKDKEFIVFVGPSGCGKSTTLRNIAGLEDISEG 60 

Query: 61 ELKIDGEWNDKSPKDRDIAMVFQNYALYPHMTVYDNMAFGLKLRKPSKQEIDKRVREAA 120 

ELKI GEVVNDKSPKDRDIAIWFQlinfALYPHMTVYDNMaFGLKLRK+ K +ID+RV+EaA 
Sbjct: 61 ELKIGGBVVNDKSPKDRDIA^WFQI^fALYPHMTVYD)SIMAK3LK^ 120 

Query: 121 ANlGLTEETjERKPADLSGGQRQRVAMGRAIVRDAKVFIJ©EPLSirajDAKLRVSra?AEIAK 180 

4GLTEFI^KPADLSGGQRQRVaMGRAIVRDflKVFIjyiDEPLSNLDAKLRVSMRAEIAK 
Sbjct: 121 QILGLTEFLERKPADLSGGQRQRTOMGRAIVRDAKVFLMDEPLSNLDAKLRVSMRAEIAK 180 

Query: 181 IHQRIGSTTIYVTHDQTEAMTIADRIVIMSATKNPDGDGTIGKIEQVGSPQELYNLPANK 240 

IH+RIGSTriYVTHDQTBAMTLADRIVIMSATKNP G+GTIGKIEQVGSPQELYNLPANK 
Sbjct: 181 IHRRIGSTTIYVTHDQTEAMTLADRIVIMSATKNPQGNGTIGKIEQVGSPQELYNLPANK 240 

Query: 241 FVAGFIGSPSMNFFKVKVENGMIISEDGLRIAlPEGQEKLLESRGyRGKELIFGIRPEDI 300 

FVAGFIGSP+MNFFH-V+V++G I+SEDGL lAIPEGQ K+LE+ 6YKG+++ FGIRPEDI 
Sbjct: 241 FVaGFIGSPAMNFFEVEVKDGRIVSBDGLDiaiPEGQRKrajEaftGXKGEKVTFGIRPE^ 300 



Query: 361 NVAKGHFFnaDTEQ&IR 377 

NVAKGHFFD DTEQAIR 
Sbjct: 361 NVARGHPFDRDTEQAIR 377 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2243 

A DNA sequence (GBSx2363) was identified in S.agalactiae <SEQ ID 6935> which encodes the amino 
acid sequence <SEQ ID 6936>. This protein is predicted to be glucan 1,6-alpha-glucosidase (dexB) (treC). 
Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N.-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty-0 . 2525 (Affirmative) < auco 

bacterial menibrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . OOOO (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaA51348 GB:X72832 glucan 1,6-alpha-glucosidase [Streptococcus 
equisimilis] 

Identities = 413/535 (77%) , Positives - 476/535 (88%) , Gaps - 1/535 (0%) 

Query: 1 MKKHWWHKATIYQIYPRSFMDSIXSDGVGDIKGITSKLDYIiEKLGITAIWLSPVYQSPMDD 60 

M+K WMHKaTIYQIYPRSF D+ G+G+GD+KBITS+LDYL+KLSITAIMLSPVYOSEMDD 
Sbjct: 1 MQKQWWHKATIYQIYPRSFKDTSGNGIGDtiKGITSQLDYLQKLGITAIWLSPVYQSPMDD 60 

Query: 61 HGYDISDYQAlADIFGI»IiromQLLQEANQRGIKIimLVVNHTSDEHAWFVEftRENPNS 120 

NGYDISDY+AIA++FG+M+DMD LL AN+RGIKIIMDLWNHTSDEHAWFVEaRENPNS 
Sbjct: 61 NGYDISDYEAIAEVFGNimDMDDIJ^AaANERGIKIimLVVNHTSDEHAWFVEAREl^ 120 

Query: 121 PERDFYIWRDEPNDLTSIFSGSAWEYDKTOGQVYIJILPSKRQPDIJSJWENEAIJ^ 180 

PERD+YIWRDEPN+L SIFSGSAWE Dh- SGQYYIiHLPSK+QPDIiNWEN +R KXXDMM 
Sbjct: 121 PERDYYIWRDEPtMtMSIFSGSAimDEASGQYYLHLPSKKQPDIiM^^ 180 



Query: 181 IMIDKBIGGFRMDVIDLIGKlPDKGITGiraPKLHDYLKEMNRaSFOKHDLLTVGETWGA 240 
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NFWI KGIGGFRMDVTDLIGKIPD ITGNGP+LHDYLKEMN+A+FG HD++TVGETWGA 
Sbjct: 181 NFWIfiKGICMPiaroVIDLIGKIPDsklTGNGPRLHDYLKEMIQATFGNHDV^^ 240 

CJuery: 241 TPDiaKQYSNPDNEEISMVFQFEHVGI^HKPnAPKWDYSDGLDVPALKDIFTKmQTO^^ 300 

TP+IA+QYS P+N+ELSMVFQFEHVBLQHKP+APKWDY++ LDVPALK IF+KWQT+L+L 
Sbjct: 241 TPEIaRQYSRPENKELSMVFQFEHVGLQHKP^IAPKWDY^iEELDVPi^lKTIFSKWQTEL^ 300 

Query: 301 GQGTOSLFWBninroLPRVLSIWGKIDSDNRKQSAKALAiriLHI^ 360 

G+GWNSLFWNNHDLPRVLSIWGNDS R++S&KRLRILLHLMRGTPYIYQGEEIGMTNYP 
Sbjct: 301 GEG^mSLFWmimDLPRVLSIWGHDSIYREKSAKMAIrJLHIJMRGTPYIYQGEEIG^mW 360 

Query: 361 FECrJiDVDDIESLNYAKEaMDNGVSEATlLDSIRKVGRDNaRTPMQWSQEHQftGFTKG-T 419 

F+ L +VDD:ESljNYAKEMyi+NGV A ++ SIRKVGRDHARTPMQWS++ AGF++ 
Sbjct: 361 FKDLTETODIESLNYAKEAMENGVPAARVMSSIRKVGRDNARTPMQWSKDTHAGFSEAQE 420 

Query: 420 PWIAWPNYQEINVEAALNDTESIFYTYQIOjVALRKEHDWLVDADFKLIjETADKVFAYVR 479 

WL VMPNYQEINV AL + +SIFYTYQ+L+ALRK+ DWLV+AD+ LL TMJKVFAY R 
Sbjct: 421 TWLPWPNYQEIWMJALfiNQDSIFYTYQQLIJUiRKDQDWLVEfiDYHLLPTADKVFAYQR 480 

Query: 480 QTIjKERYLIVftNLSDQNQSFEFPEAVKETIISirrEVQEVLSSNTLKPWDAFCIEL 534 

Q +E Y+IV N+SDQ Q F A E +I+)SIT+V +VL + I1+PWDAFC++L 
Sbjct: 481 QFGEETYVIVVNVSDQEQVFAKDLRGAEWITireDVDKVI^Tian^PWDAFCVKL 535 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6937> which encodes the amino acid 
sequence <SEQ ED 6938>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 2793 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 418/535 (78%) , Positives = 474/535 (88%) , Gaps = 1/535 (0%) 

Query: 1 MKKHWWHKATIYQIYPRSF^roSDGDGVGDIKGITSKl:J3YLEKr,GITAIWLSFVYQSPMDD 60 

M HWWHKATIYQIYPRSF D+ G+G+GD+KGITS+LDYL+KLGITAIWLSPVYQSPMDD 

Sbjct: 1 mnnhwwhkatiyqiyprsfkdtsgngigdlkg:tsqldylqklgitaiwlspvyqspmdd so 

Query: 61 NGYDISDYQAIADIFGDMNDMDQLLQEfiNQRGIKIIMDLWKHTSDEHAWFVEARKNPNS 120 

NGYDISDY+AIAD+FGDM DMD+LL AN4RGIKIIMDLVVNHTSDEHAWFVEARENPNS 
Sbjct: 61 NGYDISDYEAIADVFGDMADMDELLAAflNERGIKIIMDLWNHTSDEHAWFVEARENPNS 120 



Query: 181 NFWIDKGIGGFRhCWIDLIGKIPDIQGITGNGPKLHDYLKEMimSFGKHDLLTVGETWGA 240 

NFWI KGIGGFRMDVIDLIGK+PD ITGNGP+LHDYIiKEMN+A+FG HD++TVGETWGA 
Sbjct: 181 NFWIAKGIGGFRhCWIDLIGKVPDLEITGNGPRLHDYLKEMSQATFGNHDViynTOETWGA 240 

Query: 241 TPDIAKQYSNPEINEELSMVFQFEHVGLQHKPnAPKWDYSDQLDVPALKDIFTKWQTQLEL 300 

TP+IA4QYS P+N+ELSMVFQFEHVGLQHKPDAPKWDY+ LDVPALK IF+KWQT+L+L 
Sbjct: 241 TPEIARQYSRPENKELSMVFQFEHVGLQHKPDAPKWDYAKELDVPALKAIFSKWQTELKL 300 

Query: 301 GQGWNSLFWNNHDLPRVLSIW(a^DSDKRKQSAKALAILLHL^mGTPYIYQGEEIGMTNYP 360 

G+GWNSLFWNNHDLPRVLSIWGNDS R++SAKRI1RILLHLMEGTPYIYQGEEIGMTNYP 
Sbjct: 301 GEGWNSLFWNNHDLPRVLSIWGNDSTYREKSAKALAILLHLMRGTPYIYQaEEIGMTNYP 360 

Query: 361 FECLftDVDDIESLNYAKEAMDWGVSEATILDSIRKVGRDNaRTPMQWSQEHQAGFTKG-T 419 

F+ L +V+DIESLNYAKEaM NGVS A ++DSIRK7GRra!mRTPMQWS++ AGF++ 
Sbjct: 361 FKDLTEVNDIESrlNYAKEaM6NGVSAARV^lDSIRKVGRDNaRTPMQWSKDTHAGFSEAKE 420 

Query: 420 PWLAVHPNYQEINVEAAIMJTESIFYTYQKLVALRKEHDWLVnaDFKLLETADKVPAYVR 479 
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WD VNPNYQ+INV AL D +SIFYTYQKL+ALRKE DWLV+AD+ IiL TADKVFRY R 
SbjCt: 421 TWLPVNPISrrQDINVMMiADPDSIFYTYQKLIJU^iyCEQDW^ 480 

Query: 480 QTDKERVLIVJiNLSDQNQSFEPPEAVKETIISNTEVQEVLSStmiKPWISUi'CIEL 534 

Q +E Y+IV N+SD+ Q F A + II+NT+V VL + L+PWDAFC-H-Ii 
Sbjct: 481 QLGEETYVIVVlWSDEEQVEATDIAGaCJVIIAimiWJIVLEI^ 535 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2244 

A DNA sequence (GBSx2364) was identified in S.agalactiae <SEQ ID 6939> which encodes the amino 
acid sequence <SEQ ID 6940>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have an tmcleavable N-term signal seq 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty^-0 . 0000 (Not Clear) < suco> 

bacterial cytoplasm — Certairity4=o . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB49738 GB:U21942 UDP-galactose 4-epimerase [Streptococcus mutans] 
Identities = 267/331 (80%) , Positives = 306/331 (91%) 

Query: 1 MAVLILGGAGYIGSHMVDQLITQGKEKVIWDNLVTGHRQAVHSDAIFYEGDLSDKTFMR 60 

MA+L+LGGAGYIGSHMVD+LI +G+E+V+VVD+LVTGHR AVH A PY+GDL+D+ FM 
Sbjct: 1 MAILVLGGRGYIGSHMVDRLIEKGEEEVWVDSLVTGHRAAVHPAAKFYQGDIiADREFMS 60 

Query: 61 QVFRENPDVDAVIHFAAFSLVAESMENPLKYFDNNTAGMIKLEEVMNECDIKNIVFSSTA 120 

VFRENPDVn&VIHFAA+SLVAESM+ PLKYEnNNTAGMIKLIiEVM+E +K IVFSSTA 
Sbjct: 61 ICTRENPirmAVIHFAAYSLVAESMKKPLKYFDNNTAGMIKLl^^ 120 

Query: 121 ATYGIPEQVPILETAPQNPXNPYGESKIJMNIETIMKWADQAYGIKFVAIEYFH^ 180 

ATYGIP ++PI ET PQ PINPYGESKLMMETIMKW+D+AYGIKFV +RYEISIVRS KPDG 
Sbjct: 121 ATYGIPNEIPIKETTPQRPINPYGESKLMMETIMKWSDRAYGIKFVPVRYFNVAGAKPDG 180 

Query: 181 SIGEDHKPETHLLPIILQVAQGVRDKIMIFGDDYNTPTCTNVRDYVHPFDIiADAHILAVD 240 

SIGEDH PKTHLLPIILaVAQGVR4KIMIFGDDYNTPDGTNVRDYVHPFDLAD H+):A++ 
Sbjct: 181 SIGEDHSPETHLLPIILQVAQGVREKIMIFGDDYNTPDGTNVRDYVHPFDLADRHLLAEN 240 

Query: 241 YLRQGNESNVFNLGSSTGFSNLQMLEaARRITGKEIPAQKAARRPGDPDTLIASSEKARQ 300 

YLRQGN S FNLGSSTGFSNLQ+LEaAR++TG++IPA+KAARR GDPDTLrASSEKaR+ 
Sbjct: 241 YLRQCSNPSTAEraiGSSTGFSNIfilLEAARiCVTGQKIPAEKAARRSGDPDTLIASSEKAEB 300 



++GW+P+PD+I+KII+SAWRWHSSHP 6Y+D 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2245 

A DNA sequence (GBSx2366) was identified in S.agalactiae <SEQ ID 6941> which encodes the amino 
acid sequence <SEQ ID 6942>. This protein is predicted to be two-component response regulator. Analysis 
of this protein sequence reveals the following: 

I- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certaiiity=0. 3945 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) « suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:BAB0e470 6B:AP0015ie two- component response regulator [Bacillus halodurans] 
Identities = 71/223 (31%) , Positives = 139/223 (61%) , Gaps = 7/223 (3%) 

VLIIEDDPMVEFIHRNYLEKIiNYFQNIYSTASQTQAlAYLNDlKIQLVLLDIHIKEGNGL 62 
VL+IEDDPMV+ ++R ++EKL+ F + +TA+ + + +++ L+LLDI + + +GL 
VILIEDDPMVQEVNRMFVEKLSGFTIVGTTATGEKGMVKTRELQPDLILLDIFMPKQDGL 68 



+K +R Q+ + ++I ++AMI+ T+K G++DYL+KPFTFER ^ 





3 


Sbjct: 


9 


Query: 


63 


Sbjct: 


69 


Query: 


123 


Sbjct: 


129 


Query: 


, 180 


Sbjct: 


185 



I- V+VR+Y+ Y+E G + Y +GRP + YKL 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6943> which encodes the amino acid 
sequence <SEQ ID 6944>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4053 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <; suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 123/220 (55%), Positives 156/220 (70%) 

Query: 1 mVLIIEDDPiyiVEFIHRNYLEKLHYFQNIYSTASQTQAIAY]^IKIQLVLLDlHIKEGN 60 

M+VLIIEDDPMV+FIHRNYLEKLN FIS+S +LDI L+LLDIHI +GN 

Sbjct: 1 MNVLIIEDDPMVDFIHRNYLEKLNLFDRIISSDSMKAVQSILTDYAIDLILLDIHITDGN 60 

Query: 61 GLELLKLUaiQHQNTEVIVISaaNEaHTVKEAFHI^IVDYLIKPFTPERFESSIEKEI^ 120 

G++ L+ R QH EVI+ISaAN+ + +++ PHI.GI+DYLIKPFTFERF+ SI++F+ H 
Sbjct: 61 GIQFLEKMRTQHIPCEVIIISftANDGNIIRDGEHLGIIDYIiIKPFTFEREQESIQQFVTH 120 

Query: 121 YHTPEADKIYQDNIDHFQKlDSGWLEGEVKIiDEKGLSEITYQHILDAIQELEQPFTIQEL 180 

++ Q ID + + S +L EKHLSE T+Q I++ 1+ +QPFTIQEIa 

Sbjct: 121 REHIMQQLEQAQIDQLKCLTSKKDTKNKQLLEKGLSESTFQWIMENIKVFDQPPTIQEL 180 

Query: 181 AKCSQFSHVSVRKYIAYMEEKGLLTSQQIYTKVSRPYKVY 220 

A SHVSVRKYIAY+EE L SQQI+TKVGRPY+VY 

Sbjot: 181 ASACHLSHVSVRKYIAYLEENRQUJSQQIFTKUBRPYRVY 220 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 2246 

A DNA sequence (GBSx2367) was identified in S.agalactiae <SEQ ID 6945> which encodes the amino 
acid sequence <SEQ ID 6946>. Analysis of this protein sequence reveals the following: 
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Possible site: 21 

i» Seems to have an imcleavable N-term signal seq 

INTEGRAL Likelihood = -8.76 Transiretiibrane 12 - 28 ( 6 - 
INTEGRAL Likelihood = -7.43 Transmembrane 178 - 194 ( 173 - 

Final Results 

bacterial membrane . Certainty=0. 4503 (affirmative) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < £ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < s 



A related GBS nucleic acid sequence <SEQ ID 9003> which encodes amino acid sequence <SEQ ID 9004> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG : 0 

McG: Length of UR: 27 

Peak Value of UR; 2.99 
Net Charge of CR: 3 
McG: Discrim Score: 12.92 
GvH: Signal Score (-7.5): -2.57 

Possible site: 19 
>» Seems to have an uncleavable N-terni signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -8.76 threshold: 0.0 
INTEGRAL Likelihood = -8.76 
INTEGRAL Likelihood = -7.43 
PERIPHERAL Likelihood = 3.18 
modified ALOM score: 2.25 . 
icml HYPID: 7 CFP: 0.450 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4503 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

=.GP:CAB15141 GB:Z99120 Similar to two-component sensor histidine 
kinase [YufM] [Bacillus subtilis] 
Identities = 132/461 (28%) , Positives = 245/4S1 (52%) , Gaps = 7/461 (1%) 

Query: 3 MKKKLSLWAFLSLILVTMTICIFSIFYYVTIHQSYRMVRVQEEKILKKTGXALSRNPCIVI 62 

MKK L L L++ + + + I ++ Q4- + +R QE+ T ++ P 

Sbjct: 1 MKKTLKLCSTRLTIFVCIVVLIALLITFFTVGAQTTKRIRDQEKATAIKSTAEi^^ 60 

Query: 63 QTLKDNHyDQSLQKQMLFLSKKSNLDYIVLINLKGIRFTHPDSTKIGKPFQGGDEQAVPK 122 

L+ + LQ + K + +++V++++ GIR THPD +KIGK F+GGDE V K 

Sbjct: 61 AALESGKKQKELQSYTKRVQKITGTBFVVVMDMNGIRKTHPDPSKIGKKFRGGDESEVLK 120 

Query: 123 QKAIMSTAEGSLGKELRYLIPVY-DHQKQVGAIAVGLKLTTLGDLSQSSIKEFSKPLLIS 181 

G +STA Gh-LGKS R +PVY ++ KC3VGA+AVG+ + + ++ S++ + +S 

Sbjct: 121 GHVHISTASGTLGKSQRAFVPVXAENGKQVGAVAVGITVNEIDEVISHSLRPLYFIICVS 180 

Query: 182 ILISLWTSIISYGLKKQLHNLHPSDIFQHLEERNATLDQIQAAVFVIDQRHIIKENNPA 241 

I + ++ I++ +K ++ L P +1 LEER+A L+ + + +D+ IK N 
Sbjct: 181 IFVGVIGAVIVARTVKNIMYGLEPYEIATLLEERSaMLESTKEGILAVDEMGKIKLANAE 240 

Query: 242 ASLLFKKEGQRDLFSGKLLESLIP--QLKQDHFSKK--TEQVLHFQGQDYLLSISPITVK 297 

A LF K G 4 ++ ++P +LK+ +KK ++ + G + + + PI +K 

Sbjct: 241 AKRLFVKMGINTNPIDQDVDDILPKSRLKKVIETKKPLQDRDVRINGLELVENEVPIQLK 300 

(Juery: 298 TQNRGYWFLRNWTETLFTLDCJLAHTTAYASALQAQTHQFKWQLHVIYGaJUDIEYYDELK 357 

Q G + R+ TE +QL+ YA+AL+AQ+H+FMtT+LHVI GL ++ YD+L 

Sbjct: 301 GQTVGAIATFRDKTEVKHLAEQLSGVKMYAHALRAQSHEFMNKLHVIIKSLVQLKEYDDIjG 360 
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Query: 358 lYLKELLEPQNEFIARLSMriVREPRIASFIIGBREKFAEKHIHLSTBILVEIPTKSTVED 417 

Y+K++ Q + + V+ LA F++G++ E+ NL E IP + 
Sbjct: 361 DYIIODIMCXSKSETSEIIiroVKSSVLAGFLLGRQSFIREQGaNLDIBCNGVIPNaaDPSV 420 

Query: 418 vm^h-UJJRimiKLhTI^-S^hVSfSBXSm 456 

++ + ++ IN + + + +++ + + N++++ + 

Sbjct: -421 IHELITIIGNLIMlTOLDaVADMPKKQITMSMRFHNSILDIE 461 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6947> which encodes the amino acid 
sequence <SEQ ID 6948>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.03 Transmembrane 174 - 190 ( 170 - 195) 

Final Results 

bacterial membrane Certainty=0. 5012 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certaiiity=0 . 0000 (Not Clear) « suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 236/488 (48%) , Positives = 337/488 (68%) , Gaps = 3/488 (0%) 

Query: 3 MKKKLSLWAFLSLILVTMTICIFSIFYYVTIHQSYRMTOVQEEKILKNTGYM>SiaiPQVI 62 

MKK L LWA LSLILV+M + S+FY + +H +++ ++ QE +L +TG L+ + + 
Sbjct: 1 MKKPLRLWaSLSLILVSMIWTTSLFYGIMLHDTHQSIKNQETHLLTSTGKMLASHQAIK 60 



+ L +N + ++ NIiDY+Vh-+N+KG1R THP+ 

Sbjct: 61 EliLNNQENAKTTAYTNSIASIYISn^YVVVMNMKBIRLTHPNPKNIGKPFQGGDEEAVI^ 120 

30 

Query: 123 GKAIMSTAEGSLGKSLRYLIPVYDHQKQVGAIAVGLKLTTLGDLSQSSIKEFSKPLLISI 182 

GK ++STA+G+I1GKSLRYI1+PV+D KQ+GAIAVG+KLTTL D++ +S + ++ LL+ + 
Sbjct: 121 GKKSn:STAKGTLGKSLRYLVPVFIX3DKQIGAIAVGIKLTTLND\m^TSKRNYTLSI^ 180 

35 Query: 183 LISLVVTSIISYGLKKQLHNLHPSDIFQHLEERNATLDQIQAAVFVIDQRHIIKRNNPAA 242 

LISL+VTS IS+ LK+QLH L PS+I+Q EERNA LDQI4-AAVFV+D+ I+-1- N A 
Sbjct: 181 LISIVLVTSFISFRLKRQUIQDEPSEIYQLFEERNAMIiDQIERRVFVVDKaGILQIiCNQaG 240 

Query: 243 SLLFKKEGQRDLFSGKLLESLIPQLKQDHFSKKTEQVLHFQGQDYLLSISPITVKTQNRG 302 
40 L ++ Q +G L P + + EQ+ + +DYLL+ISPI VK +RG 

Sbjct: 241 QKLIftRKCQLGKPTGNSFNYLFPDFPKLSLQEGHEQLFRYGEEDYLLAISPICVKNDHRG 300 

Query: 303 YWFLRNVTETLFTLDQLAHTTAYASALQAQTEQFmTOLHVIYGLADIEYYDELKIYLKE 362 
+++F+R + + TLDQLA+TTAYASALQAQTK+FMNQLHVIYGL DI YYD+LKIYL 
45 Sbjct: 301 HIIFMREAVKAIDTLDQLAYTTAYASALQAQTKKFMNQLHVIYGLVDIAYYDQLKIYLDS 360 

Query: 363 LLEPQNEFLARLSMLVREPRLASFIIGSREKFAEICHINLSTEILVEIPTKSTVEDVNNYL 422 

+LEP+NE L LS+LV+EP LaSF+IGHl+EK+ E +++L ++L EIP +T +NN h 
Sbjct: 361 ILEPENEILTSLSVLVKEPLLASFLIGEQEKYQELNVHLKIDVLSEIPHSATKNQDSINGL 420 

50 

Query: 423 IiLHRYINTKILTLLNSTTLVSLRIiNYQNNLIETDYQWENEKML-IJSDYHQYFNnAyFQQL 481 

+++R+I+T +LT L +LV + QtJ+LI + + W+ L F+ YFQQL 

Sbjct: 421 MIYRFIHTNLLTTLRPKSLVLSIQHDQNHLI--SHYTLTDNWIDIiERVQPIPDLPyFQQL 478 

55 Query: 482 LVDSRATY 489 

L D+ 4 + 
Sbjct: 479 LTDTOSQF 486 

SEQ ID 9004 (GBS130d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
60 cell extract is shown in Figure 123 (lane 8-10; MW 63kDa) and in Figure 184 QsaiQ 4; MW 63kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
123 Oane 11; MW 38kDa) and in Figure 181 Cane 7; MW 381cDa). 
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GBS130d-GST was purified as shown in Figure 237, lane 11. GBS130d-His was purified as shown in 

Figure 233, lane 9-10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 



Example 2247 

A DNA sequence (GBSx2368) was identified in S.agalactiae <SEQ ID 6949> which encodes the amino 
acid sequence <SEQ ID 6950>. Analysis of this protein sequence reveals the following: 

Possible site: 51 



Seems to have no N-terndnal signal sequence 










INTEGRAL 


Likelihood =-11.52 


Transmembrane 


364 


380 


353 


336 


INTEGRAL 


Likelihood = -9.6S 


Transmembrane 


33 


49 




57 




Likelihood = -7.80 


Transmembrane 


87 


103 


82 


105 


INTEGRAL 


Likelihood = -6.35 


Transmembrane 




169 


144 


174 


INTEGRAL 


Likelihood = -4.41 


Transmembrane 


301 


317 


300 


318 


INTEGRAL 


Likelihood = -2.81 


Transmembrane 


216 


232 


212 


235 


INTEGRAL 


Likelihood = -2.3 9 


Transmembrane 


120 








INTEGRAL 


Likelihood = -1.65 


Transmembrane 


57 


73 




73 


INTEGRAL 


Likelihood = -1.17 


Transmembrane 


428 








INTEGRAL 


Likelihood = -0.32 


Transmembrane 


276 


292 


276 


292 



Final Results 

bacterial membrane Certainty=0. 5607 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 





18 


Sbjct: 


14 


Query: 


78 


Sbjct: 


74 




138 


Sbjct: 


134 




198 


Sbjct: 


194 


Query: 


258 


Sbjct: 


254 




318 


Sbjct: 


314 




378 


Sbjct: 


374 


Query: 


438 


Sbjct: 


434 



+IGSV LPVY+ A +1L+ L++LPVNMLGGFAVILTMGW LGTIG 1 



P K+PGGPAILSLLVPSH-VFENL+N+NVL+ST++IJ1KQRNFLYFYIACLV GSILGMN 



RK+L+QGL+RMI PM LGM+ AM VGT VG +LGL W+H+LFYIVTPVLAGGIGEGILPL 



SLGYSSITGVASEQLVAQLIPATIIGNFFAILCTALLNRLGEJCKPHLSGQGQLWLNKGE 257 
SLGYS+ITG+ SEQLV QLIPATIIGNFFAI+C+ LL+RLGEK+? L3GCGQL+++ + 
SLGYSAITGLPSEQLVGQLIPATIIGNFFAIMCSGLLSRLGEKRPELSGQGQLIKITWSD 253 



PIDVK M6 GVL AC+LFI G LLQ LTGFPGPVLMIV AA LKY+NV+ 



FFVSRF+NMNPVEA I+SACQSGMGGTGDVAILSTA+RM LMPFAQVATRLGGAITVITM 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 695 1> which encodes the amino acid 
sequence <SEQ ID 6952>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRia. 



I- terminal signal sequence 



Likelihood 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood - 
Likelihood = -4, 
Likelihood = -3, 
Likelihood = -2, 
Likelihood = -2, 
Likelihood = -2, 



Transmembrane 
Transmembrane 
Transmembrane 
Transmeitibrane 



Transmenibraiie 117 - 



Transmembrane 425 - 441 



Transmembrane 273 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



350 - 383) 
79 - 102) 
137 - 171) 



425 - 442) 
209 - 232) 
271 - 290) 



■ Certainty=0 . 57S5 (Affirmative 

■ Certainty=0. 0000 (Not Clear) ■ 
• Certainty=0. 0000 (Not Clear) • 



The protein has homology with the following sequences in the databases: 

>GP:AaB18291 GB:U35658 L-malate permease [Streptococcus bovls] 
Identities = 344/443 (77%) , Positives = 394/443 (88%) , Gaps = S/443 (1%) 

I SKKMPQKDLSEHSKftWQNR RIGSVPLPVYLVLATLILVTGWLQQLPVNMLGGFAV 5 9 

+ KK+P +E W+N+ RIGSV LPVYLV A++ILVT L+QLPVNMLGGFAV 

MEKKLPATAANETD- -WRNKLTKTRIGSVTLPVYLVTASI ILVTALLEQLPVNMLGGFAV 58 



TPVLAGGIGEGILPLSLGYSAITG+ SEQLV QLIPATIIG1!IFF2U:+C+ LL+R GEK P 



SGQGQL+KI +S+D+SnaL+++ +DVKLMGaGVL AC+LFI GGLLQHLT FPGPV 



LMI+4-AAFIiKYIiNV+P+ETQ G+KQLYKFIS NFTFPIM GLG+LYIPLK+W LSW(3Y 



F+WISW TV++ GFFVSRP+NM+PVEAAI+SaCQSGMGGTGDVAILSTA+RM LMPPA 





4 


Sbjct: 






60 


Sbjct: 


59 






Sbjct: 


119 




180 


Sbjct: 


179 


Query: 


240 


Sbjct: 


239 


Query: 


300 


Sbjct: 


299 


Query: 


360 


Sbjct: 


359 


Query: 


420 


Sbjct: 


419 



QVATRLGGAITVITMTAI I 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 356/419 (84%) , Positives = 385/419 (90%) 

Query: 27 KlGSVPLPVYVCLAIJ:.ILLAGFLQKLPVNMLGGFAVILTMGWFLGTIGaSIPGFKlS^ 86 

+IGSVPLPVY+ LA LIL+ G+LQ+LPVHMLGGFAVIIfl'+GW LGTIGA+IPG K+PGGP 
Sbjct: 24 RIGSVPLPWLVLATl,ILVTGWjU3QLPVNMLGGFAVILTLGWLLGTIGATIPGLKHFGGP 83 
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Query: 87 AILSLLVPSILVFFHLINKNVLESTNMMKQMIFIiYFYIACLVSGSILGMNRKMLIQGLL 146 

AILSLLVPSILVFFNL+N NVLE+TN+LMKQaNFLYFYIACLV GSILGMNRK+LIQGL 
Sbjct: 84 AILSLLVPSILVFFNLLNP]m,EaTNVIMKQMS(FLYFYmCLVaKILGMK^ 143 

Query: 147 RMIFPMLLGrWCAMMVGTFVGVILGLEWRHTLFYIVTPVLaGGIGEGILPLSLGYSSITG 206 

RMI PMLLGIWCAM VGT VGVILGL+W+HTLFY+VTPVIiAGGIGEGILPLSLGYS+ITG 
Sbjct: 144 imIIP^aJIm?CMlGVGTLVGVTMLDWQHTLFYVVTPVIAGGIGEGILPLSLGYSAITG 203 

Query: 207 VASBQLVAQLIPATIIGNFFAlLCTALIiISmiiGEKKPHLSGQGQLVRLNKjGESDMSDIIftDH 266 

V SEQLVAQLIPATIIGNFFAILCTAUUSIR GEK P SGQGQLV++ EDMSD + D-l- 
Sbjct: 204 VGSEQLVaQIiIPATIIGNFFAILCTALLNRFGEKHPSYSGQGQLVKIGHSEDMSDALKDN 263 

Query: 267 SGPIDVKKMGGGVLTACSLFIFGHIiQQLTSPPGPVIJirWiAAirJW 326 

SG +DVK MG GVLTACSLFI G LLQ LT FPGPVLMI+ AA LKY+NVIP+ETQNGAK 
Sbjct: 264 SGAICVKLMHAGVLITVCSLFIAGGLLQHLTDFPGPVmilLAAFLKYIWIPQETQISKSS^ 323 

Query: 327 QLYKFISGNFTFPLMAGLGLLYIPLKDWATLSIQYFIWISWFTVISVGFPVSRFLNM 386 

QLYKFIS NFTFPLMAGLGLLYIPLK+WATLS QYFIWISW TV+SVGFFVSRFLNM 
Sbjct: 324 QLYKFISSNFTFPLMAGLGLLYIPLKEWATLSWQYFIWISWLTWSVGFFVSRFLNM 383 

Query: 387 NPVEAGIISACQSG^raGTGDVAILSTADRMNLMPFAQVATRLGGAITVITMTAILRMLF 445 

+PVEA IISACQSGMGGTGDVAILSTADRMNLMPFAQmTRLGGAITVITMTAILR++F 
Sbjct: 384 SPVEAAIISACQSGMGGTGDVAILSTADR.WLMPFAQVATRLGGAITVITMTAILRIIF 442 

Based on this analysis, it was predicted that these proteins and their epitopes cotild be useful antigens for 
vaccines or diagnostics. 

Example 2248 

A DNA sequence (GBSx2369) was identified in S.agalactiae <SEQ ID 6953> which encodes the amino 
acid sequence <SEQ ID 6954>. This protein is predicted to be malic enzyme (mae). Analysis of this protein 
sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane 164 - 180 ( 164 - 181) 

. Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB07709 GB:U3S659 malic enzyme [Streptococcus bovis] 
Identities = 285/386 (73%), Positives = 332/386 (85%), Gaps = 1/386 (0%) 

SENLGQLAINQARENGGKLEVISKVKVEDKRDLSIAYTPGVASVSSAIAEDVELAYELTT 61 
++++ QA++ GGKLEV KV +E K DL lAYTPGVA+VSSAI E E AYELTT 

TKDVKEIAIEQAKJCFGGKLEVCPKVPIETKADLGIAYTPCSVRAVSSAIYEKKERAYELTT 62 



C ++PTFGGINLEDISAPRCFEIEQRLI+E DIPVFHDDQHGTAIWLAflL+NSLKLH- 



K lEDI W+NGGGSAGLSITRK L+AG KH+ +VDR GI++-^ D 4-L PHH lAKLT 



NRE ++G L ALE ADVF+GVSAP L EWI +M ++P++FAMANP+PEI+PD+AL A 





2 


Sbjct: 


3 




62 


Sbjct: 






122 


Sbjct: 


123 




182 


Sbjct: 


183 




242 
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Sbjct: 242 WEHRTGDmTM;ESM)VEVGVSRPGVLKPimiQ®4tffiQPVIFJaflRNPVPEIFPDEALAA 301 

Query: 302 GAYIVGTCRSDFPMQINNVLAFPGIFRGaVLnARAKTITVBMQIAaflRGIASLIPEEELST 361 

GAYIVGTGRSDFPNQINNVLAFPGIFRGALDARAK IT+EMQIiiflA+GIA LIP+ EX+ 
Sbjct: 302 GAYIVGTGRSDFPNQINNVLftFPGIFRGALaARAKKlTIBMQIflftftKGIAKLIPDNELTP 361 

Query: 362 THIIPNAFQNDVADWAKSVBNAVQK 387 

T+IIP+ FQ VA WA+SV NaV++ 
Sbjct: 362 TNIIPDPFQEGVAKWAESVENAVKE 387 

A related DNA sequence was ideatified in S.pyogenes <SEQ ID 6955> which encodes the amino acid 
sequence <SEQ ID 6956>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 1S4 - 180 ( 154 - 181) 
INTEGRAL Likelihood = -1.75 Tranamembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane CertaintY=0 . 1977 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 {Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 7 QLALEQfiKTFGQKLEVQPKVDIKTKHDLSIAYTPGYASVSSAIAKDKTLAyDLTTKK^ 66 

+I1A+EQAK FGGKLEV PKV I+TK DL lAYTPGVA+VSSAI + K AY+LTTKKNTV 
Sbjct: 8 EIAIEQAKKFtMKLEVCPKVPIETKADI^IAVTPGVAAVSSArraCKERAYELTTKK^ 67 

Query: 67 AVISDGTAVLSLGDIGPEAAMPVMEGKAALFKAFAGVDAIPIVIJ)TKDTEEIISIVKAIiA 126 

AVISDG+AVLGLG+IGPEAAMPVMEGKAALFK FAGVD+IP+VLDT+DTEEII VK LA 
Sbjot: 68 AVISDGSAVLGLGNIGEEaaMPVMEGKAALFKRFAGVDSIPLVLDTQDTEEIIQTVKFLA 127 

Query: 127 PTPGGIMLBDISAPRCFEIEQRLIKECHIPVFHDDQHGTAIWLAAIFNSLKLLKKSLDE 186 

PTFGGINLEDISAPRCFEIEQRLI E IPVFHDDQHGTAIWLAA++NSLKL+ K +++ 
Sbjct: 128 PTPGGINLEDISAPRCEEIEQRLIDELDIPVFHDDQHGTAIWLAALYNSLKLINKKIED 187 

Query: 187 VSIVVNGGGSAGLSITRKLLAAGATKOTVVDKFGIINBQEAAQLftPHHLDIAKVTNREFK 246 

+ +V+NGGGSAGLSITRK LAAG + +VD+ GI++E + A L PHH +IAK+TNRE + 
Sbjct: 188 IHWlNGGGSAGLSITRKFLAAGVKHIIIVDRTGILSETDTA-LPPHHAEIAiCLTNREHR 245 

Query: 247 SGTLEDALEGADIFIGVSAPGVLKAEWISKMRARPVIFAMANPIPEIYPDSALEAGAYIV 305 

+G L ALEGAD+P+QVSAPGVLK EWI +M +PVIFAMANP+PEI+PDHaL AGAYIV 
Sbjot: 247 TGDLATALEGADVFVGVSAPGVLKPEWIQQMNBQPVIFAMANPVPEIFPDEALAAGAYIV 306 

Query: 307 GTGRSDFENQINNVLAFPGIPRGALnARAKTITVEMQIAABKGIASLVPDnALSTTNIIP 366 

GTGRSDFEHQINNVIAFPGIFRGaLtaRAK IT+EMQIAAAKGIA L+PD+ L+ TNIIP 
Sbjct: 307 GTGRSDFPHQINNVLRFPGIFRGALDARAKKITIEMQIARRKGIAKLIPDNELTPTNIIP 366 

Query: 367 DAFKEGVAEIVAKSVRSW 385 

D F+EGVA++VA+SVR+ V 
Sbjct: 367 DPFQEGVAKTVAESVRlffiV 385 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 306/387 (79%), Positives = 349/387 (90%) 
Query: 
Sbjct: 

Query: 61 TKKmTaWSDGSAVIfiLGDIGPEAAMPVMEGKAALFKRFANVDAVPIVLKIl^ 120 

TKKimZAV+SDG+AVLGLGDIGPEAAMPVMEGKAALFK FA VDA+PIVL T DTEEIIS 
Sbjct: 61 TKKNTVAVISDGTAVraiCDIGPEAAMPVmGKAALFKAFAGVDAIPIVLOTKDTEElIS 120 
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Query: 121 IVKAISPTFCSGlMLEDISMRCFEIEQKLISECDlPVFHDDQHGTAIWLftMjENSLKLV 180 

IVKA+4-PTFGGINLEDISAPRCFEIEQRLI+EC IFVFHDDQHGTAIWLAA+FNSLKL+ 
Sbjct: 121 IVKAIiAPTFGGINLEDISAPRCFEIEQRLIKECHIPVFHDDQHGTAIVVIAAIFNSLKLL 180 

Query: 181 KKDIEDIRVVVNGGGSAGLSITRKLLSAGAKHVTVVDRFGIINDKDRESLAPHHKAIAKL 240 

KK ++++ +VVNGGGSAGriSITRKLL+AGA VTVVD+FGIIN+++ LAPHH IAK+ 
SbjCt: 181 KKSLDEVSIVVNGGGSAGLSITRKLIJUiGATKVTVVDKFGIINEQERftQL^ 240 

Query: 241 TimEFQSGSLEDALENADWIGVSAPBAIJIAEWISKMftDKPIVEAMftNPIPEIYPDQALK 300 

TNREF+SG+LEDALE AD+FIGVSAP L AEWISKMA +P++ERMaNPIPEIYPD+AL+ 
Sbjct: 241 TNREFKSGTLEDALEGADIFIGVSAPGVLKAEWISKMAARPVIBaMANPIPEIYPDEALE 300 

Query: 301 AGAYIVGTGRSDFPNQINNVLAFPGIFRGALDARAKTITVEMQIftMRGIASLIPEEELS 360 

AGAYIVGTGRSDFPNQINNVLAFPGI FRGALDARAKTITVEMQIftaA+GIASL+P4-+ LS 
Sbjct: 301 AGAYIVGTGRSDFPNQINNVLAFPGIFRGALDARAKTITVEMQiaaaKGIASLVPDDaLS 360 

Query: 361 TTHIIPNAFQNDVADWAKSVSNAVQK 387 

TT+IIP+aF+ VA++VAKSV + V K 
Sbjct: 361 TTNIIPDAFKEGVaEIVi^SVRSWLK 387 

Based an this analysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2249 

A DNA sequence (GBSx2370) was identified in S.agalacttae <SEQ ID 6957> which encodes the amino 
acid sequence <SEQ ID 695 8>. This protein is predicted to be Bta. Analysis of this protein sequence 
reveals liie following: 
Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Likelihood - -2.02 Transmembrane 29 - 45 ( 29 - 45) 



Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < sucos 

bacterial outside Certainty=0 . OOOD (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MYSFEELLATMTLITAAEIEDKIDSNQDFVLFIGRISCPFCHLFVPKIVEVADEDEFELF 60 

MF + ++ + T +++D+ FIGR +CP+C F + V E + ++ 
Sbjct: 1 MEQFLDNIKDLEVTTWRAQEALDKKETATFFIGRKTCPYCRKFAGTLSGWAKTKAHIY 60 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6959> which encodes the amino acid 
50 sequence <SEQ ID 6960>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm — Certainty=0 . 0900 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 39/111 (35%) , Positives = 66/111 (59%) 
Query: 3 SFEELLATMTLITAAEIEDKIDSNQDFVLFIGRISCPFCHLFVPKIVEV2U)EDEFELFHL 62 



Query: 63 DSEDFDHWTANKEFRNKTOIPTVPGIWVKNGTIKVKCDSKMrKEEIREFI 113 

DSE+ FR Y + TVP L+V + + CDS +T ++I F+ 

Sbjct: 71 DSElBUmaaEL2W^REl]YQLVTVPALLVSYDQHQRAVCDSSLTPDDIIAFL 121 

SEQ ID 6958 (GBS427) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 5; MW 16.2kDa). 

GBS427-His was purified as shown in Figure 214, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2250 

A DNA sequence (GBSx2371) was identified in S.agalactiae <SEQ ID 6961> which encodes the amino 
acid sequence <SEQ ID 6962>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

»> Seems to have an imcleavable W-term signal seq 

INTEGRAL Likelihood = -7.75 Transmembrane 2 - 18 ( 1 - 21) 

Final Results 

bacterial membrane Certainty=0. 4100 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9437> which encodes amino acid sequence <SEQ ID 9438> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 



Query: 
Sbjct: 

Query: 61 RMVLDVDGVYLTFELAAIKS 80 

++ LD +G + F+ +I++ 
Sbjct: 61 KVTLDCEGAPFDFDQQSIRT 80 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6963> which encodes the a 
sequence <SEQ ID 6964>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.10 Transmembrane 3 - 19 ( 1-22) 
INTEGRAL Likelihood = -3.03 Transmembrane 63 - 79 ( 63 - 79) 

Final Results 

bacterial membrane Certainty=0. 3442 (Affirmative) < suco 

bacterial outside -— Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



wo 02/34771 



-2542- 



PCT/GBOl/04789 



>GP:BaA11328 GB:D78257 ORFll [Enterococcus faecalis] 
Identities = 29/75 (38%) , Positives = 52/75 (68%) 

Query: 6 ILMFTVMrGLIWEMQRQQKKQaQERQNQDSlMEKGDEVVTIGGMEaiTO 65 
5 ++M +V++ + +++ R QKKQ +ERQ+ m ++ GD WTIGG+ ++ E+ + KK+ L 

Sbjct: 5 hXmSLVJVmiTn,VRTi3IWy2^ 64 

(3uery: 66 DVDGVFLTFELLRIK 80 
D +G F F+ +1+ 
10 Sbjct: 65 DCEGAPPDFDQQSIR 79 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 53/90 (70%) , Positives = 80/90 (88%) 

15 Query: 4 PIIMLWMVGra^FFMQRQQKKQAQERQKQIlWQKGDBIVTIGGIiPGVVDEWrEAQRMV 63 

PI+M WM+G+++FMQRQQKKQAQERQ QUS1A++KGDE+VTIGG+F +-VDEV+T A+++V 
Sbjct: 5 PILMFVVMLGLIWFMQRQQKKQRQERQNQIJSIAIEKGDEVVTIGGMERIVDEVDTTAiaaV 64 

Qnery: 64 LDVDGVYLTPELflAIKSWSKSATPTEPVE 93 
20 LDVDGV+LTPEL AIK +V+KA T T VE 

Sbjct: 65 LDVDQVFLTFELLAIKRIVTKaTTETTLVB 94 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 2251 

A DNA sequence (GBSx2372) was identified in S.agalactiae <SEQ ID 6965> which encodes the amino 
acid sequence <SEQ ED 6966>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have an uncleavable N-term signal seq 

30 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefial antigens for 
vaccines or diagnostics. 

40 Example 2252 

A DNA sequence (GBSx2373) was identified in S.agalactiae <SEQ ID 6967> which encodes the amino 
acid sequence <SEQ ID 6968>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

■>» Seems to liave no N-terrainal signal sequence 
45 IHTEGE2iL Likelihood - -1.38 Transmembrane 164 - 180 ( 164 - 180) 

Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < succ:> 

The protem has homology with the following sequences in the GENPEPT database. 

>GP:CaB61731 QB:AL133220 putative oxidoreductase . [Streptomyces 



wo 02/34771 



PCT/GBOl/04789 



coelicolor A3 (2) ] 

Identities = 72/216 (33%) , Positives = 120/216 (55%) , Gaps = 1/216 (0%) 

Query: 14 AQALEARGQKLYSVaiTOTYDKGLEFATKyGIQKVYDHIDQVFEDPEVDIiyiSTPHNTHI 73 

A ++ ++,+VA+RT FA ++GI + Y + + D +VD++y++TPH+ H 

Sbjct: 25 ADLVDLPDAEWVAVASRTEASAKTFABKEGIPRAYGGWETLARDEDVDVVyVATPHSAHR 84 

Query: 74 SFLRKAUiNSKHVLCEKSITUISTELKEAIDMETNHVVIAEAMTIFH^ 133 

+ h G++VLCEK TLIN+ E E + LA N V L EAM ++ P+ R+LK LV 

Sbjct: 85 TAAGLaiEASRNVLCEKPFTUSlAREAAELVALARENGVFLMEAMN^ 144 

Query: 134 SGKLGPLroaQMNFGSYKEK)MTNRFFSRDLACK3MiLDIGVYALSCIRWFMSEM 193 

G +G ++ +Q +FG + +R GGALLD+GVy +S + + E P ++ + 

Sbjct: 145 DGRIGEWSLQaDFGlAGPFPARHiaira>PAQC3GGRLII)LGVYPVSFAQLLLGE-PTDVAA 203 

Query: 194 QVTFAPTGVDEQVGILLTNPANEMA'n/SLSLHAKQP 229 

+ + GVD Q G LL+ + +A++ 3+ P 
Sbjct: 204 RAVDSEEGVDLQTGALLSYGNnRLASIHCSITGGTP 239 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2253 

A DNA sequence (GBSx2374) was identified in S.agalactiae <SEQ ID 6969> which encodes the amino 
acid sequence <SEQ ID 6970>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4957 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2254 

A DNA sequence (GBSx2375) was identified in S.agalactiae <SEQ ID 6971> which encodes the amino 
acid sequence <SEQ ID 6972>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 



- Final Results 

bacterial cytoplasm --- Certainty=0 . 1892 (Affirmative) ■ 

iDacterial membrane Certainty=0. 0000 (Not Clear) < ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < ! 



No corresponding DNA sequence was identified in S.pyogenes. 
50 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiol antigens for 
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Example 2255 

A DNA sequence (GBSx2376) was identified in S.agalactiae <SEQ ID 6973> which encodes the amino 
acid sequence <SEQ ID 6974>. This protein is predicted to be a host cell surface-exposed lipoprotein. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 38 

»> Seems to have an uncleavable H-term signal seq 

IHTEGRflL Iiikelihood = -7.75 Transmembrane 9 - 25 ( 5 - 28) 

Final Results 

10 bacterial membrane Certainty=0 .4100 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9005> which encodes amino acid sequence <SEQ ID 9006> 
1 5 was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Xiength of DR: 24 

Peak Value of DR: 2.84 
20 Net Charge of CR: 2 

McG: Disorim Score: 10.29 
GvH: Signal Score (-7.5): -4.34 

Possible site: 34 
»> Seems to have an uncleavable N-terra signal seq 
25 Amino Acid Composition: calculated from 1 

ALOM program count: 1 value: -7.75 threshold: 0.0 

INTEGRAL Likelihood = -7.75 Transmembrane 5 - 21 ( 1-24) 
PERIPHERAL Likelihood =13.31 86 
modified ALOM score: 2.05 
30 icml ffifPID: 7 CFP: 0.410 



*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0 .4100 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:AAC03455 GB:AF020798 putative host cell surface -exposed 

lipoprotein [Streptococcus thermpEJhilus baoterioEiliage TP-J34] 
Identities = 40/102 (39%) , Positives = 63/102 (61%) , Gaps = 10/102 (9%) 

Query: 101 KNALISAKIYSKnmiLSKQSIFEQLYSESPDKATHSDKFTKEESQYAIDHLKUDFKENAL 160 
45 + A+ AK Y+ T+++SK+ + QL S DK++++ S YA+++ +t)4- + AL 

Sbjct: 51 RTAVSKAKQVASTVHMSKEELRSQLVS FDKySQDASDYAVENSGIDYNKQAL 102 

Query: 161 ETAKSYQSSSSLSKEEIYKQLTSTLGDKFTNDEAQYAVDHLK 202 
E AK YQ + S+S + I QL S DKFT +EA YAV +LK 
50 Sbjct: 103 EKAKQYQDTLSMSPDAIRDQLVSF--DKFTQEEADYAVaNLK 142 

Identities = 40/112 (35%) , Positives = 64/112 (56%) , Gaps = 9/112 (8%) 

Query: 41 KKAKIKFimQKKIVKKaREYAICSGHMSKDSIIEKLKiaDSKKniQEDINFVIMmjKVDYK 100 
+ ++ K K + V KA++YA + HMSK+ + +L K Y Q+ ++ + N +DY 
55. Sbjct: 40 QSSESKVPKBYRTAVSKAKQYASTVHMSKEELRSQLVSFDK-YSQDASDYAVENSGIDVN 98 

Query: 101 KHALISAKIYSKTMNLSKQSIFEQLYSESPDKATHSDKFTKEESQYAIDHLK 152 

K AL AK Y T+++S +1 +QL S DKFT+EE+ YA+ +LK 

Sbjct: 99 KQALEKAKQYQDTLSMSPDAIRDQLVS FDKFTQEEADYAVANLK 142 

60 

No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 9006 (GBS122) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 6; MW 21.9kDa). 

GBS122-His was purified as shown in Figure 202, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 2256 

A DNA sequence (GBSx2377) was identified in S.agalactiae <SEQ ID 6975> which encodes the amino 
acid sequence <SEQ ID 6976>. This protein is predicted to be transposase (orfA). Analysis of this protein 
sequence reveals the following: 

10 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty*=0. 2830 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GBNPEPT database. 

>GP:CaB90833 GB:AJ250837 hypothetical protein [Streptococcus dysgalactiae] 
20 Identities = 91/96 (94%) , Positives = 93/96 (96%) 

Query: 1 MSRKVRimFTDDFKQQIVDLYNVGRKRSSLlimELTPSTFDKWVEQAKTTGSFKSIDNL 60 

MSRK+RRHFTDDFKQQIVDLYN GRKRSSLIK YELTPSTFDKWVRQZOCTTGSFKS+DNL 
Sbjct: 1 MSRKIRRHFTDDFKQQIVDLYNAGRKKSSLIKEYELTPST™kWVRQAKTTGSFKSVDNI. 60 

25 

Query: 61 TDEQRELIEDRKHNKELEMQLDILKQAAVIMAQKGK 96 

TDEQRELIELRK NKELEMQLDILKQAaVIMAQKGK 
Sbjct: 61 TDEQRELIELRKRNKELEMQLDILKQAAVIMRQKGK 96 

30 Based on this analysis, it was predicted that this protein and its epitopes, coidd be useful antigens for 
vaccines or diagnostics. 

Example 2257 

A DNA sequence (GBSx2378) was identified in S.agalactiae <SEQ ID 6977> which encodes the amino 
acid sequence <SEQ ID 6978>. This protein is predicted to be transposase (orfB). Analysis of this protein 
35 sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2618 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9915> which encodes amino acid sequence <SEQ ID 9916> 
45 was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9903> which encodes amino add sequence <SEQ ID 9904> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CM90834 GB:AJ250837 putative transposase [Streptocmccus dysgalactiae] 
Identities p 243/259 (93%) , Positives = 250/259 (95%) 



Query: 


1 


MCRWUmPHSSYVYQAVESVSETBFEETIKRIFLDSESRYGSRKIKICIiIilKEGITrjSRRR 


60 










. 

jc : 




MCRWLNXPRSSYYYKAVEPVSEAELEESIKAXFLESKARYGSRKIKICLinTEGITLSRRR 


60 


(Juery: 


61 


IRRIMKRLNLVSVYQKA.TFKEHSRGKNiaPIENHIiDRQFKQERPLQBLOT^ 


120 










Sb'ct- 


61 


LQRL 


120 




121 


WAWCLIIDLYIvmEIIGLSLGWHKTAELVKQAIQSIPYALTIOTKMFHSDRGKEFDMQLID 


180 






WAYVCLIIDLyUREIIGLSLGWHKTAELVKQAIQSIPY LTKVKMFHSDRGKEP+NQLID 




Sbj Gt: 




WAYVCLIIDLYNREIIGLSLGWHKTAELVKQAIQSIPYPLTKVKMFHSDRGKEFNNQLID 


180 




181 


EILEAFGITRSLSQAGCPyDNAViffiSTYHAFKIEFVYQETFQLLEELALKTKDYVHWWNY 


240 






EILEAFGITRSLSQAGCPYDNAVAESTYRAFKIEFVYQETFQ LEELALKTK YVHWWNY 




Sbjct: 


181 


EILEAFGITRSLSQAGCPYDNAVAESTYRAFKIEFVYQETFQSLEELRLKTKAYVHWWNY 


240 




241 


HRIHGSLNYQTPMTKRLIA 259 








HRIHGSUQYQTPMTKRLIA 




Sbjct: 


241 


HRIHGSENYQTPMTKRLIA 259 





There is also homology to SEQ ID 32. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2258 

A DNA sequence (GBSx2379) was identified in S.agalactiae <SEQ ID 6979> which encodes the amino 
acid sequence <SEQ ID 6980>. This protein is predicted to be pXOl-128. Analysis of this protein sequence 
30 reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 3684 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 »GP:aftD32432 GB:AP065404 pXOl-128 [Bacillus anthracis] 

Identities - 45/69 (65%) , Positives = 52/69 (75%) 

Query: 17 MKKAGKSNKVimTLGIKNNSQrmJMKWYENEELYRFHQGVGKQYTYGKGLEHLSEVEQ 76 
MKK SNR IME LGIKN SQI TWMKWY ++ YEF Q VGRBY+YGKG + LSE+EQ 
45 Sbjct: 1 MKKESYSNRTIMEIMIKNVSQIKTWMKWYRaDQTYRFQQPVSKQYSYGEGPKEIjSEIiEQ 60 

Query: 77 LQLQVDLLK 85 

L+L+ LK 
Sbjct: 61 LRLENKHLK 69 

50 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2259 

A DNA sequence (GBSx2380) was identified in S.agalactiae <SEQ ID 6981> which encodes the amino 
acid sequence <SEQ ID 6982>, This protein is predicted to be transposase. Analysis of this protein 
sequence reveals the following: 

1 uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainfcy=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2260 

A DNA sequence (GBSx2382) was identified in S.agalactiae <SEQ ID 6985> which encodes the amino 
acid sequence <SEQ ID 6986>. This protein is predicted to be Lmb. Analysis of this protein sequence 
reveals the following: 

Possible site: 18 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Wot Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1595> which encodes the ammo acid 

sequence <SEQ ID 1596>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty*=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities =- 302/306 (98%) , Positives = 303/306 (98%) 

Query: 1 MKKOTETjyBMWSLVMIAGODKSlfflPKQPTQGMSVWSFYPiy^^ 60 

MKK FFLMAMWSLVMIAGCDKSaKPKQPT(3GMSVVTSPYPMyAMTKEVSGDI^ 
iSbjct: 1 MKKGFFIiMfiMWSLVMIAGCDKSRNPKQPTQGMSVVTSFyPMYAMTKBVSGDIit^^ 60 

Query: 61 SGAGIHSFEPS\raDVftAITOfiDLFVYHSHTLEAWJa©LDPNLKKSKVNVFERSKPL^ 120 

SGaGIHSFEPSVNDViiAIVnADLFVXHSHTLBAMaRDLDENLKKSKV+VFEaSKPLTIiDR 
Sbjct: 61 SGftGlHSFEPSVlSnmAITOftDLFVYHSHTIKiWBRDinDPlSILKKSK^^ 120 

Cuery: 121 VKGLEDMEOTQGIDPATLYDPHTVmJPVLMEERWIJOTIfiHLDPKHKDSYTKKA^^ 180 

VKGLEDMEVTQGIDPATLVDPHTWTDPVLAGBEaVNIAKELG LDPKHKDSYTK AKSFK 
Sbjct; 121 VKeLSDMEVTQGIDPATLTOPHTOTDP^^JVGEEAVNIAKEIlGRIlDPKHKDSY^K^^ 180 

Query: IBl KEAEQliTEEYIQKFKKTOSKTFVTQOTAFSYIiAKRFQLKQLGISaiSPEQEPSPRQLKEI 240 
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KEAEQLTEEYTQKFKKVRSKTEVTQHTaFSYLAKRFGLKQLGISGISPEQEPSPRQLKEI 
Sbjct: 181 KSaffiQLTEBYTQKFKKVRSKTFVTQHTAFSYIJUaiPGLKQLGISGISPEQEPSPRQIjKEI 240 

Query: 241 QDFVKEVNVKTIPflEDNVNPKIflHAiaKSTGJUWKTIiSPLEftAPSGl^^ 300 

QDFVKEYNVKTIFMDNVNPKIMaiAKSTGSUCVKTLSPLEA^ 
Sbjct: 241 QDF^raYNVKTIFAE0NVNPKIJ«ffiIl«KTGAKVKTLSPLEAAPSGffiCra 300 

Query: 301 LYQQLK 306 

LYQQLK 
Sbjct: 301 LYQQLK 306 

There is also homology to SEQ ID 4. 

SEQ ID 6986 (GBS189) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 2; MW 35.2kDa). 

The GBS189-His fusion product was purified (Figure 204, lane 7) and used to immunise mice. The 
resulting antisermn was used for Western blot (Figure 248A), FACS (Figure 248B), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2261 

A DNA sequence (GBSx2383) was identified in S.agalactiae <SEQ ID 6987> which encodes the amino 
acid sequence <SEQ ID 6988>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=C.4656 (Affirmative) < buco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB41455 GB:U34956 phosphoribosylf ormylglycinamidine synthase 
[Mycdbaoterium tuberculosis] 
Identities = 73/237 (30%) , Positives - 112/237 (46%) , Gaps = 25/237 (10%) 

GAGGVCVAIGBLliD- 
G G+ A ELA 

GGAGLSCATSELASAGDGGMTIQLDSVPLEAKEMTPAEVLCSESQERMCAWSPKNVDAF 341 

lAACNKENIDAVWATVTEKPNLVMTWNGETIVDLERCPLDTNG VRWVDAKW 152 

+A C K + A V+ VT+ L +TW+GET+V0+ + G V + 





43 


Sbjct: 


282 




99 


Sbj Ct : 


342 




153 


Sbjct: 


402 




211 


Sbjct: 


462 



PY OA A+ EA 



There is also homology to SEQ ID 982. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or di^ostics. 

Example 2262 

A DNA sequence (GBSx2384) was identified in S.agalactiae <SEQ ID 6989> which encodes the amino 
acid sequence <SEQ ID 6990>. This protein is predicted to be 30S ribosomal protein SU (rpsK). Analysis 
of this protein sequence reveals tihe following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=o. 0598 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Hot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9281> which encodes amino acid sequence <SEQ ID 9282> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10919> which encodes amino 
acid sequence <SEQ ID 10920> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 2 HGNAIiAWSSAGALGFKGSRKSTP?AA.QMAftEAAAKSAQEHGLKTVEVTVKGPGSGRESAI 61 

HGNA++WSSAGALGF+GSRKSTPFAJ\QMAflE AAK ~ EHGLKT+EVTVKGPGSGRE+AI 
Sbjct! 40 HG>JAISWSSAGaLGPRGSRKSTPFAAQMAflETAaKGSIEHGLKTLEVTVKGPGSGREaAI 99 

(juery: 62 RALaaAGLEVTAIRDVTPVPHNQARPPKRRRV 93 

RAL AaGLBVTAIRDVTPVPHNG RPPKRRRV 
Sbjct: 100 RALQaftGLBVTAIRDVTPVPHNGCKPPKRRRV 131 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6991> which encodes flie amino acid 
sequence <SEQ ID 6992>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0945 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/93 (98%), Positives = 93/93 (99%) 

Query: 1 ^fflGNaIAWSSAGALGPKGSRKSTPPAAQMAflEaAAKSAQEHGLKTVEVTVI^ 60 

+HGN7M^WSSAGALGETCGSRKSTPFAAg4AAEARRKS&QRHGLraOTVTVHGPGSGEEaA 
Sbjct: 35 VHGNAIAWSSAGALGFKGSRKSTPFAAQMAAEAAAKSAQEHGDKTVEVTVKGPGSGRBSA 94 

Query: 61 IRALaAAGLEVTAlRDVTPVPHNGaRPPKRRRV 93 

IRALAflAGLEVTAIRDVTPVPHNGARPPKRRRV 
Sbjct: 95 IRAIAftAGLEVTAIRDVTPVPHNGARPPKRRRV 127 

Based on this analysis, it was predicted that these proteins and their epitopes coidd be useful antigens for 
vaccines or diagnostics. 
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Example 2263 

A DNA sequence (GBSx2385) was identified in S.agalactiae <SEQ ID 6993> which encodes the amino 
acid sequence <SEQ ID 6994>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
5 »> Seems to have no M-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2551 (Mf irmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside -— Certainty-0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB0388l GB:AP001507 DWA-directed RMA polymerase alpha subunit 

[Bacillus halodurans] 

15 Identities = 190/314 (60%), Positives = 249/314 (78%), Gaps = 2/314 (0%) 



C2uery: 1 MIEFEKPIITKIDENKD--YGRFVIEPLERGyGTTLGNSLRRVLLSSLPGAA.VTSIKIDG 58 

MIE EKP+I 1+ ++D YG+FV-fEPLERGYGTTLGNSLRR+LLSSLPGAA.VTS++IDG 
Sbjct: 1 MIEIEKPVIETIEISEDAKYGKFWEPLERGyGTTLGNSLRRILLSSLPGAAVTSVQIDG SO 

Query: 59 VLHEFDTIPGVREDVMQIItNVKGLAVKSYVEDEKIIELDVEGPAEITAGDILTDSDIEI 118 

VUJEF TI GV EDV I+LN+K LA+K Y +++K +E+D +G +TAGD+ DSD+4+ 
Sbjct: 61 VIflEFSTIEGVVEDVTTIVIMjKQIjftLKIYSDEDKTLEIDTC2GBGVVTM^ 120 

Query: 119 VNPDHYLFTIAEGHSLKATMTViUaTOGYVPAEGNKKDnaPVGTIiAVDSIYTEVKKVNy 178 

+NPD ++ T+ G L+ +T + RGYVPAEGNK D+ +G + +DSIYTPV +VNyQV 
Sbjct: .121 JajPDLHIATLTTGAHLRMRITAKRGRGYVPAEGNKSDELRIGVIPIDSIYTPVSRVNyQV 180 

Query; 179 EPARVGSMDGPDKLTIEI^ra^GTIIPEDALGLSARVLIEHLlOTlFroLTEVAKATE^mKE^ 238 

E RVG +DKLT+++ T+G+I PE+A+ L A++L EHEN+F I.T+ A4- E+M E 
Sbjct; 181 EOT:RVGQ\mTYDKLTLDVWTDGSIRPEEAVSLGAKILTEHrilIPVGLTI)QAQNZffiI^WEK 240 

Query; 239 EKVra^EKVLDRTIEELDLSVRSYNCLKRAGINTVFDLTEKTEPEmKTONLGRKSLEEVK 298 

E+ EKVL+ TIEEDDLSVRSYNCLKRRGIHTV -hLT+KTE +MMBCVRNIiGRKSIiEEV+ 
Sbjct: 241 EEDQKEKA^JEMTIEELDLSWS™CLKRftGINTVQELTQKTEED^lMKVRNLGRKSIlEEVQ 300 



Query: 299 IKLADLGLGLKNDK 312 

KL +I1GI1GL+ ++ 
Sbjct: 301 EKLGELGLGLRKEE 314 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6995> which encodes the amino acid 
sequence <SEQ ID 6996>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2551 (Affirmative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/312 (97%) , Positives = 311/312 (98%) 

Query: 1 MIEFEKPIITKIDENKDYGRFVIEPLERGYGTTLGNSLRRVLLSSLPGAAVTSIKIDGVL 50 

MIEFEKPIITKIDENKDYGRFVIEPLEaraYGTTLGNSLRRVLLSSLPGAAVTSIKIDGVL 
Sbjct: 1 MIEFEKPIITKIDENKDYGRWIEPIiERGYGTTlXSNSIiREVLLSSLPGAAVTSIKIDGVL SO 

Query: 61 HEFDTIPGTOEDVMQIItNVKGLAVKSYVEDEKIIELDVEGPAEITAGDILTDSDIEIVN 120 

HEFDTIPGTOEDVMQIIUlVKGIAVKSYVEDEKIIElH-VEGPAE+TAGDILTDSDIE+VN 
Sbjct: 51 HEFDTIPGVREDVMQIILNVKGLAVKSYVEDEKIIELEVEGPAEVTAGDILTDSDIELVM 120 
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Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


301 



PDHYLFTIAEGHSL+ATMTVAK RGYVPAEGNKKDDAPVGTI1A.VDS lYTPVKKVNYQVEP 



VOTEKVIIJRTIEEIiDLSVRSYNCLKEAGIjtm/FnLTEK+EPEMMK^^ 
VlTOEKVIiDRTIEBLDLSVRSXHCLKRafilimrFDLTEKSEPEMMKVim^ 300 

LflDLGLGIiKNDK 312 
LaDLGLGLKNDK 
lADLGLOriKKIDK 312 

Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2264 

A DNA sequence (GBSx2386) was identified in S.agalactiae <SEQ ID 6997> which encodes the amino 
acid sequence <SEQ ID 6998>. This protein is predicted to be SOS ribosomal protein L17 (rplQ). Analysis 
of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no n-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 1609 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11920 GB:Z99104 ribosomal protein L17 (BL15) [Bacillus subtilis] 
Identities = 95/128 (74%) , Positives = 105/128 (81%) , Gaps = 8/12B (6%) 

Query: 1 MAYRKLGRTSSQRKaMIja)LTTDLLIlSIESIVTlEU?aKEIRKTVEKMITIX3KRGDLH^ 60 

M+YRKLGRTS+QRKftMIiRDLTTDL+INE I TTE RAKE+R VEKMITLGKRGDIiHRRR 
Sbjct: 1 MSYRKLGRTSAQRKaMLRItt]TTDLIIMERIETTETRMCELRSVVEKMlTIfi 60 

Query: 61 QAAAYVRNEIASENYDEASDKYTSTTALQKLFDDIAPRYAERNGGYTRILKTEPRRGDM 120 

QAAAY+RNE+A+E ++ ALQKLF DIA RY ER GGYTRI+K PRRGD A 

Sbjct: 61 QAAAYIRNEVANEENNQ nALQKLPSDIATRYEERQGGYTRIMKLGPRRGDGA 112 

Query: 121 PMAIIELV 128 

PMAIIELV 
Sbjct: 113 PMAIIELV 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6999> which encodes the amino acid 
sequence <SEQ ID 7000>. Analysis of this protein sequence reveals the following: 
Possible site: 37 

>» Seems to have no M-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1609 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 125/128 (97%) , Positives = 127/128 (98%) 
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Query: 1 MATOKLGRTSSQRKMma3LTTDLLINESrmB!ARAKEIRKTVEKMITLGKE 60 

MAYRKLCSRTSSQRKAMLRDLTTTILLINESIVTTEaRAKEIRKTVEKMITIfiKRGDL^^ 
Sbjct: 1 MATOKLGRTSSQRKAMLRDLTTDLLIlffiSIVTTEAIUUCErRKaVEKMITLGKRG^^ 60 

5 

Query: 61 QaAAYVRNEIASENroEASDKXTSTTALQKLFDDIAPRYftERNGGYTRILKTEPRRGDSA 120 

QAAAYVRNEIASENYDEA+DKyTSTTALQKLP +IAPRYAERNGGYTRIIiKTEPRRGDAA 
Sbjct: 61 QAAAYVRM:iASEOTDEaTDKyTSTTM.QKLPSEIAPRYiffiRNa3YTRII.KTEPRRGDaA 120 

10 Query: 121 PMAIIELV 128 

PMAIIELV 
Sbjct: 121 PMAIIELV 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 2265 

A DNA sequence (GBSx2396) was identified in S.agalactiae <SEQ ID 700 1> which encodes the amino 
acid sequence <SEQ ID 7002>. This protein is predicted to be mercuric reductase. Analysis of this protein 

sequence reveals the following: 

20 Possible site: 35 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2384 (Affirmative) < suco 

25 bacterial membrane Certainty=0. OOOO (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The piotein has homology with the following sequences in the GENPEPT database. 

>GP:AAa83977 GB:AF138877 mercuric reductase MerA tBacillus sp. 

30 RC607] 

Identities = 29/33 (87%) , Positives = 32/33 (96%) 

(Juery: 4 VGI.TEEQAKEKGYDVKTSVLPLXAVPRAIVNRE 36 
VGLTE+QAKEKGY+VKTSVLPL AVPRA+VNRE 
35 Sbjct: 520 VGLTEQQAKEKGyEVKTSVLPLDAVPRALVNEE 552 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 2266 

A DNA sequence (GBSx2397) was identified in S.agalactiae <SEQ ID 7003> which encodes the amino 
acid sequence <SEQ ID 7004>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the foUovwng: 

Possible site: 49 
45 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3016 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences m the GENPEPT database. 

>GP:CA&70224 GB:Y09024 mercuric reductase [Bacillus cereus] 
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Identities = 146/194 (75%) , Positives = 175/194 (89%) 



Query: 


2 


PQISGLEKMDYLTSTTLLELKKIPKRLTVIGSGYIGMELCSQLPHHLGSEITLMQRSERLL 


61 






P I GL t+DYLTST+LLEIiKK+PKRIi VIGSGYIGMELGQLFH+IjGSE+TL+QRSERLL 




Sb'cf 




PNIPGLNEVDYLTSTSLIjELKKVPKRLWIGSGYIGMEIjGQLFHNDGSEVTLIQRSERLL 


285 


Query: 


62 


KEYDPEISESTOKaLIEQGIOTjWGaTFERVEQSGEIKRVYVTVNGSREVIESDQliLVAT 


121 






KEyDPBISBSVEK+L+EQGINI.VKGAT+ER+EQ+G+IK+V+V VNG + +IE+DQI1LVAT 




Sbjct: 


286 


KEYDPEISESVEKSLVEQGIiSILVKGATYERIEQNGDlKKViroEVNGKKRIIEftDQLLVAT 


345 




122 


GRKPOTDSUSLSaVAGVETGKNlTOILIiroFGQTSNEKIYAaGDVTIGP 


181 






GR ENT +IiNL AAGVE G EI+I+D+ +T+N +lYMGDVTLGPQFVYVftAY+GG+ 




Sbjct: 


346 


GRTPOTATiajLRaAGVEIGSRGEIIIDDYSRraNTRIYAAGDVTLGPQFVYVaAYQGGVa 


405 




182 


TDHMGGIiNKKIDr. 195 








IC^IQGLNKK+'fL 




Sbjct: 


406 


AENAIGGIiNKKUIL 419 





There is also homology to SEQ ID 1820. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2267 

A DNA sequence (GBSx2398) was identified in S.agalactiae <SEQ ID 7005> which encodes the amino 
acid sequence <SEQ ID 7006>. This protein is predicted to be triacylglycerol acylhydrolase. Analysis of this 
25 protein sequence reveals the following: 
Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certaiiity=0. 3180 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiol antigens for 
vaccines or diagnostics. 

Example 2268 

A DNA sequence (GBSx2399) was identified in S.agalactiae <SEQ ID 7007> which encodes the amino 
40 acid sequence <SEQ ID 7008>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal secjuence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 0544 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Hot Clear) < suco 

The protein has homology with the following sequences m the GENPEPT database. 

50 >GP:fiAC74453 GB:AE000234 orf, hypothetical protein [Escherichia 

coli K12] 

Identities = 45/58 (77%) , Positives = 51/58 (87%) 
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Query: 1 MPWQNLLHAGQENLFSGLTALTAEFTVGEGKLMTHDBPCSMAPDDKHDLISGTCSHLP 58 

+PWQNLLHAG+ENI1FSGLTAI1+AEFT+GEG+LM HD P KPD+ DLISGTCSHLP 
Sbjct: 34 LPWQNLI^GEENLFSGLTALSMPTIGEGELMaHDVPLGCAPDEyDDLISGTCSHLP 91 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted lhat this protein and its epitopes, could be useftil antigens f 



Example 2269 

A DNA sequence (GBSx2400) was identified in S.agalactiae <SEQ ID 7009> which encodes the amino 
acid sequence <SEQ ID 7010>. This protein is predicted to be transposase for insertion sequence element 
is5. Analysis of this protein sequaice reveals the following: 

■> N-terminal signal sequence 

15 Final Results 

bacterial cytcplasm — Certainty=0 .2058 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) «; suco 

20 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MEQILPWQNMVEVIEPFYPKAGNGRRPyPLETMLRIHCMQHWYNLSDGAMEDALYEIASM 60 

^ffiQILPwal™^EVIEPFypKaGKGRRpypr^^Ima^IHCMQHWMS^ 

Sbjct: 40 MEQILPWQNMVEVIEPFYPKftGNGRRFYPLEriMIjRlHCMQHWYm 99 

Query: 61 RLFARLSLDS2U.PDRaTIMNFRHIiEQHQU«QLFKTI]roWLRESGVtIOT 120 

RLFARLSLDSALPDRTTIMNETlHLriEQHQI^QLFKTINRWLaEftGVM^^ 
Sbjct: 100 RLFiU«LSLDSM.PDRTTimFRH]:iEQHQiaRQLFKTIlSIRWi:iaEaGVMK^ 159 

Query: 121 EAPSSTKNKEQQRDPB^fflQTKEGNQWHFGMKAHIGTOAKSGLTHSLVTTAaNEHDIJJ^ 180 

EAPSSTKNKEQQRDPEm(>TKKGNQiraFGMKaHIGVDAKSGLTHSLVTTAAN^ 
Sbjct: 160 EAPSSTKNKEQQRDPEMHQTKKGNQWHPGMKaHIGVDAKSGLTHSLVTTAflNEHDtNQr.G 219 

Query: 181 NLLHGEEQFVSiU5AXyQGRPQREEIAEVDVDWLIAERPGKVRTrjKQHPRKNICrAINlEYM 240 

NLLHGEEQFVSADA yQGMQEEELAEVDVDWLIAERPGKVRTLKQHPRKNKTAINIEYM 
Sbjct: 220 NLLHGEEQFVSADAGXQGAPQREELREVDVDWLIAERPGKVRTLKQHPRKNKTAINIEYM 279 

Query: 241 KASIRARVEHPFRIIKRQFGFVKARYKGIiIiKNIMQLftraiFTl^^ 299 

XASIRARVEHPFRIIKRQFeFVKARYKiBLLKiqnNQIi2W.FTI^^ 
Sbjct: 280 KASIRaRVEHPPRIIKRQFGFVKARYKBIJ:jaroNQIAMLFTLANLPRADQM 338 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 

vaccines or diagnostics. 

Example 2270 

A DNA sequence (GBSx2401) was identified in S.agalactiae <SEQ ID 701 1> which encodes the amino 
acid sequence <SEQ ID 701 2>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 
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bacterial membrane Certaiiity=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm CertaintY=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 

>GP:CMa51958 GB:Mil09661 putative euteryotic-type serine/threonine 
protein kinase [Streptomyces coelicolor A3 (2)] 
Identities = 49/169 (28%) , Positives = 90/169 (52%) , Gaps = 6/169 (3%) 

10 Query: 23 PTTIRTODVSNKTVftQRKMTLENSGrjKWGaiRNIESDSVSEGLVVKTDPA^ 82 

P T+++PDV+ + +A+ LE+ GL+ G + SD V+ G V+ T P +G + R G+ 
Sbjct: 469 PDTVKLPDVTGYKLDKaRTLIJIDEGUEPGMVTRAFSDEVaRGPVISTKPGSGTTV^ 528 

Query: 83 VNLYIATPNKSFTLCaWKEHNYlOjrrOTLQGKGViaCSDIKVKRKIMMDYTTO^ 142 
15 VL + + ++++. +L+G G+K + ++N++Y +G + A+ P 

Sbjct: 529 VAL-WSKBSPVr»TODVTGDDLDEARaELBGaGIiK--VKrADERVNSEYDSGRV-ARQTP 584 

Query: 143 EGTSFWPDGNKKLTLWAVNDPMI-MPDTOSMTVGEn^IETLTDMLTM 190 
E +G+ +TLTV+ MI +PDV G+V+ +LDG + D 

20 Sbjct: 585 EPGGRAAKGD-TVTLTVSKGPRMIEVEDWGDSVDH&KQKLEnaGFEVD 632 

Identities = 45/161 (27%) , Positives - 80/161 (48%) , Gaps = 4/161 (2%) 

Query: 27 RVPDVSNKTVAQaKMTLENSGLKVGAIRNIESDSVSEGLVVKTDPAAGRSRREGAKVNIjY 86 
+VP + +KT AQa+ L+++GI1 VG +R+ SD+V G V+ TDP G R+ V+L 
25 Sbjct: 405 KVPPLtiSKTEaQARDRLDDAGLDVGKVRHAYSDTVERGKVISTDPGVGDRIRKNDSVSLT 464 

Query: 87 IATPNKSFTLGNYKEHNYKDILKDLQGKGVKKSLIK7KRKINKDYTTGTIIAQSLPEGTS 146 

++ + L + + L+ +G++ + V R +++ G +++ GT+ 

Sbjct: 465 VSDGPDTVKLPDVTGYKLDKARTLLEDEGLEPGM--VTRAFSDEVARGFVISTKPGSGTT 522 

30 

Query: 147 FNEDCajKKLTLTVAVNDPMIMPDVTGMTVGEVIETLTDLGL 187 

+ L V+ P+ +PDVTG + B L GL 
Sbjct: 523 VR--AGSAVALWSKGSPVDVPDVTGDDI,DEARAEIjEGaGL 561 

35 There is also homology to SEQ ID 3026. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2271 

A DNA sequence (GBSx2402) was identified in S.agalactiae <SEQ ID 7013> which encodes the amino 
40 acid sequence <SEQ ID 7014>. Analysis of this protein sequence reveals the following: 

tt uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty«=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9311> which encodes amino acid sequence <SEQ ID 9312> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90561 GB:AE001058 glutamine ABC transporter, ATP-binding 
protein (glnQ) [Archaeogldbus fulgidus] 
Identities = 142/219 (64%) , Positives = 178/219 (80%) 

Query: 1 miHQGEVVVIIGPSGSGKSTFLRTMNLLEVPTKGTVTFEGIDITDKKIJDIFKMREKMGM 60 

M + +GEVWIIGPSGSGKST LR +M LE PT G + +G+DIT+ K DI K+R+++G+ 
Sbjct: 24 MKVEKGBVWIIGPSGSGKSTLLRCINRLEEPTSGKILLDGVDITNSKIDINKVRQRIGI 83 
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Query: 61 VFQQENLFPNMTVLENITLSPIKTKGLSNLDAQTKAYELLEKVGLKEKANTYPASLSGGQ 120 

VFQQENLFP++T L+N+TL+PIK K +S +A+ r,LEKVGL++KA+ YPA LSGGQ 

Sbjct: 84 VFQQFNLFPHLTMiQNVTLaPIKIKKMSKREAEELGMRLi:.EKVGLEDKaDYYPAQLSGGQ 143 

Query: 121 QQRIaIflRGLAraPDVIJ:lFDBPTSALDPE^WGBVLTVMQDLaKfiGMT^IVIVTHEMGFi^^ 180 

QQR+AIflR LAMNP+V+LFDE TSALDPE+V EVL VM+ IiA+ GMTMV+VTHEMGFARE 
Sbjct: 144 QQRmiAIUaAMNPEVMLFDEOTSALDPELVKEVLDVMKQIARI^ 203 

Query: 181 VaDRVIFMDAGIIVEQGAPKEVFEQTKEIRTRDFIiSKVL 219 

V DRVIFMD G+IVE+G P+++F K RTR FLS +Ii 
Sbjct: 204 VGDRVIFMDGGVIVEEGKPEQIFSNPKHERTRKFLSMIL 242 

There is also homology to SEQ ID 1186. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefhl antigens for 
vaccines or diagnostics. 

Example 2272 

A DNA sequence (GBSx2403) was identified in S.agalacttae <SEQ ID 7015> which encodes the amino 
acid sequence <SEQ ED 7016>. This protein is predicted to be 4-hydroxy-2-oxoglutarate aldolase (kdgA). 
Analysis of this protein sequence reveals the following: 

N-tenninal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 147S (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14127 GB: 299115 deaxyphosphogluconate aldolase [Bacillus subtilis] 
Identities = 21/62 (33%) , Positives = 38/62 (60%) , Gaps = 4/62 (6%) 

Query: 3 QLMQGKIV&VIRGNSQEEAFQAAQACIKGGISAIEIA-miSKRSQVIEQLVTQYTNQEQV 62 

+L + K++AVIR ++EA Q ++ + GI A+E+ YT AS +IE + N+E + 

Sbjct: 9 RIiKBAKLIAVIRSKDKQEACQQIESIiIiDKGIRAVEVTYTrPGRSDIIE SFRNEBDI 64 

Query: 63 W 64 

Sbjct: 65 LI 66 

Based on this analj^is, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2273 

A DNA sequence (GBSx2405) was identified in S.agalactiae <SEQ ED 7017> which encodes the amino 
acid sequence <SBQ ID 7018>. This protein is predicted to be H rqieat-associated protein (rfbQRS) 
(bl458). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=Q, 0207 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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There is homology to SEQ ID 504. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2274 

5 A DNA sequence (GBSx2406) was identified in S.agalactiae <SEQ ID 7019> which encodes the amino 
acid sequence <SEQ ID 7020>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.74 Transmembrane 2 - 18 ( 1-21) 
10 INTEGRAL Likelihood = -3.03 Transmembrane 73 - 89 ( 73 - 92) 

Fii»l Results 

bacterial membrane Certainty=0. 3697 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty4=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
There is also homology to SEQ ID 3376. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
20 vaccines or diagnostics. 

Example 2275 

A DNA sequence (GBSx2407) was identified in S.agalactiae <SEQ ID 7021> which encodes the amino 
acid sequence <SEQ ID 7022>. This protein is predicted to be insertion element ISl protein InsB (insB_5). 
Analysis of this protein sequence reveals the following: 

25 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 4280 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analjrsis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 2276 

A DNA sequence (GBSx2409) was identified in S.agalactiae <SEQ ID 7023> which encodes the amino 
acid sequence <SEQ ID 7024>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3937 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogefies. 

Based on tMs analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2277 

A DNA sequence (GBSx2410) was identified in S.agalactiae <SEQ ID 7025> which encodes the amino 
acid sequence <SEQ ID 7026>. This protein is predicted to be triosephosphate isomerase (tpi). Analysis of 
this protein sequence reveals the following: 
Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Likelihocjd = -0.37 Transmembrane 35 - 51 ( 35 - 51) 



Final Results 

bacterial meinbrane Certair.ty=0 . 1150 (Affirmative) ■ 

bacterial outside Certainty=0 . ( 

bacterial cytoplasm Certainty=0 . ( 

The protein has homology with the following sequences in the GENPEPT 

>GP:AaC43268 GB:T307 6 4 0 triosephosphate isomerase [Lactococcus 
lactis] 

Identities = 50/75 (66%) , Positives = 51/75 (80%) 



lAGNWKMNK EA+AF+EAV + LPSS+ VE+ 1 APAL L+ + +GSELK+Aa+N 
Sbjct: 7 lAGlMMNKmiSEAQAFVERVKIMLPSSiaffV^ 66 

Query: 66 SYFENSGftFTGENSP 80 
£ 

Sbjct: 67 £ 

There is also homology to SEQ ID 6838:. 

Identities = 58/77 (75%), Positives = 58/77 (87%) 

Query: 6 lAGNWKMNKNPEEAKAFIEAVMKLPSSELVEaGIiaPALTLSTVLEftAiaSSELKI^^ 65 

XAGNWKMNKNP-l-E%KAF+E3^VASKLPS+-)-LV+ +AAPA+ L T +EAAK S LK+AAQM 
Sbjct: 7 IAeNWKMNimPQEAKAFVER\ffiSKLPSTDLVDVA\mAPAVDLVTriEft^ 66 

Query: 66 SYFENSGftFTGENSPKV 82 

YFEN+GAFTGE SPKV 
Sbjct: 67 CYFENTGAFTGETSPKV 83 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2278 

A DNA sequence (GBSx2412) was identified in S.agalactiae <SEQ ID 7027> which encodes the amino 
acid sequence <SEQ ID 7028>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 96 - 112 ( 96 - 112) 

Final Results 

bacterial meinbrane Certainty^^O. 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BMil4368 GB:D90354 surface protein antigen precursor 
[Streptococcus sdbrinus] 
Identities = 50/129 (46%) , Positives = 76/129 (58%) , Gape = 18/129 (13%) 

5 

Query: 3 ISFDNSFLEWSDDSaBQADVYLQMKKIAAGQVEDNrTYLHTVNGWISSNT^^ 62 

++F FL +VS DSAFQA+VYLQMKRIA G NTY++TVNG SSNTV T TP+P++ 
Sbjct: 1442 VTFKEDFLRSVSVDSAFQftETOiQMKRIAVICTFflOTTOmnJGITYSSlSrrTO^^ 1501 

10 Query: 63 PSPNQP TPPQPPIETIEPPVPASILENTGEQES LLGLIG- -AGILLGT 108 

PSP P P Q PP A LP T0+ + LLGL+ AG L 

Sbjct: 1502 PSPVDPKTTTTWFQPRQGKAYQPAPPAGRQ-LPATGDSSNAYLPLLGLVSLTAGFSL— 1558 

Query: 109 AYGUCKKEE 117 
15 GL++K++ 

Sbjct: 1559 -LGLRRRQD 1566 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 2279 

A DNA sequence (GBSx2413) was identified in S.agalactiae <SEQ ID 7029> which encodes the amino 
acid sequence <SEQ YD 7030>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3691 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9359> which encodes ammo acid sequence <SEQ ID 9360> 
was also identified. 

The protem has homology with the following sequences in the GENPEPT database. 

>GP:CAB15793 GB:Z99123 phosphotransacetylase [Bacillus subtilis] 

3 = 131/221 (59%), Positives = 169/221 (76%), Gaps = 2/221 (0%) 

LVDPVILGKADEVHDSXJmLGFVDQDYSIIDPEQYEKFEEmEAFVEIRKBKATMEDADR 65 
+++P+++G +E+ L I DP YE E++ +AFVE RKHKRT E A + 

VIOTmGI!^EIIEIQAKaKE]mlTIlGGVKIYDPHTYEG^EDLVQAFVERRK^KATEE 100 

LLKDVNYFGVMiVKMIjmGlWSGAIHSTADTVRPALQIIKTKPGISRTSGWLMl^^ 125 
L D NYFG MLV GLaDG+VSOA HSTADTVRPALQUKTK G+ +TS6VF+M R 



Identities 


Query: 


6 


Sbjct: 




Query: 


66 


Sbjct: 


101 


Query: 


126 


Sbjct: 


159 


Query: 


186 


Sbjct: 


219 



+E+Y+FADCAINI P++Q+LAEIA+ +A+TAK+FDI+P++AMLSFSTKBSAK+ + EKV 
EEQYVFADOVINIAPDSQDIJiErAIESAlSrrAKMFDIEPRVaMLSFSTKBSAKSDETEKV^ 218 

EAAKIAKDLSPBLAVDGELQFnAAFVPETAEIKAENSDVRG 226 
+A KIAK+ +PBI1 +DGE QFDAABVP AE KAP+S++ G 
nAVKIAKEKAPELTLDGEFQFDAAFVPSVAEKKAPDSEIKB 259 

A related DNA sequence was identified in S.pyogenes <SEQ ID 703 1> which encodes the amino acid 
sequence <SEQ ID 7032>. Analysis of this protein sequence reveals the following: 
Possible site: 34 
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»> Seems to have no.N-terndnal signal sequence 

Final Results 

bacterial cytoplasm --- Oertainty=0. 3182 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 181/227 (79%) , Positives = 211/227 (92%) 

Query: 1 MKFEGLVDPVILGKMEVHDSiaRLGFVDQDYSIIDPEQraKPEBMKEASVBIRKiGK^ 60 

+KFKGIJ++P+ILG+++EV + L +LGF DQDY+Il+P +Y F++MKEAFVE-I-RKGKA.T+ 
Sbjct: 38 LKPEC3IlI^PIIIXMSEEVRNLLTKI/3FMQDyTIINP^IEYJ«)FDKMKEAPVEWKGKaTI. 97 

Query. 61 EDADRLLKDVNYFGV^ILVKMIJfflG^WSGAIHSTJUDTVRPAI:C 120 

EDM++L+DVNyFGVMLW+aiJU3GMVSGAIHSTJVDTVRPALQIIKTKPGISRTSGVFLM 
Sbjct: 98 EDADKMLRDVHYFGVmVKMGIJADG^WSGaIHSTADTVRPArlQIIKTKPGI3RTSGVFLM 157 

Query: 121 NRENTQERYIFADCAINIDPNAQELAEIAVNTADTAKIFDIDPKIAMLSFSTKGSAKAPQ 180 

NRENT ERY+FADCAINIDP AQEJjAEIAVMTA+TAKIFDIDPKIAMLSFSTKGS KAPQ 
Sbjct: 158 NRENTSERYVFADCRINlDPTAQELAEiaVNTAETAKIFDIDPKIAMLSFSTKGSGKAPQ 217 

Query: 181 AEKVQEAflKIAKDLSPErLRVDGELQFDaAFVPETAEIKAPNSDVaGK 227 

+KV+EA +IA L+P+LA+DGELQPnRAFVPETA IKAP+S VRG+ 
Sbjct: 218 VDronSEATEIATGLNPDLRLlDGELQFDZiRFVPETAAIKMDSAVRGQ 264 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vacdnes or diagnostics. 

ExaiDple 2280 

A DNA sequence (GBSx2414) was identified in S.agalactiae <SEQ ID 7033> which encodes the amino 
acid sequence <SEQ ID 7034>. This protein is predicted to be lipopolysaccharide biosynthesis protein- 
related protein. Analysis of this protein sequence reveals the following: 

o H-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

= 20/176 (11%) 

Query: 1 MKVLLYLEAEEYLKKSGIGRAIKHQEKaLQIAGIDYTTNPT- — 41 

M+ L YLE& E L4- G+ A Q AIri- ++ P 
Sbjct: 2 MRAIJTYLEAAEALR-GGMVTATNQQRAALETTDVEVVETPWRAGDPVRSiaSLAAGGSCF 60 

Query: 42 DDFDL^mMOTYGIRSV&LMSKAKKTGKK^rIIffiGHSTEEDFia^SFIGSNI,VSPLFKWYLCR 101 

FD+ H N G S + A++T +++H H T EDF SF GS+ ++P + YL 
Sbjct: 51 TAFDVAHCNLVGPGSVAVARHARRTDTPLVLHAHLTREDFAQSFRGSSTIAPALEPyLRW 120 

Query: 102 FYQKADAIITPTDYSKQLIKAYGIKKPIFVLSHGIDLSRYQXSEKKESAFRHYFHL 157 

FY +AD ++ P++Y+K +++AY + PI LSNG+DL Q E + R F L 
Sbjct: 121 FYSQADLVLCPSEYTIOJVLRAYPVDAPIRQLSNGVDLESMQGYESFRADTRARFDL 176 



There is also homology to SEQ ID 1220. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be usefbl antigens for 
vaccines or diagnostics. 

Example 2281 

A DNA sequence (GBSx2415) was identified in S.agcdactiae <SEQ ID 7035> which encodes the amino 
5 acid sequence <SEQ ID 7036>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

^» Seems to have no IT-termlnal signal sequence 

Final Results 

10 bacterial cytoplasm — Certainty=0 . 2S25 (Affirmative) < suco 

bacterial menibrane Certainty-0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>QP:Aa.C35010 GB:AF0559a7 intracellular a-amylase [Streptococcus tnutans] 
Identities = 27/46 (58%) , Positives = 33/46 (71%) 

Query: 1 MEVGEIYaGRTFVDTfliGNCEQEVVIGDDCSPraDFLVESASISAWVPK 46 

M +C3E K FVOTCj NC +EV++ D GWGDF V+ AS+SAWV K 
Sbjct: 438 M»INKEFMSinCWDyiilMnBWILDDQGWCmFPVQiMLSAWVllK 483 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 22S2 

A DNA sequence (GBSx2416) was identified in S.agalactiae <SEQ ID 703 7> which encodes the amino 
acid sequence <SEQ ID 7038>. This protein is predicted to be RopA. Analysis of this protein sequence 
reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2082 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certalnty=0. 0000 (Not Clear) < suco 

There is also homology to SEQ ID 6908: 

Identities = 30/35 (85%) , Positives = 33/35 (93%) 

40 Query: 1 MEADQVEGLLSADMLKHDIAMKKAVDVITSSATVK 35 

M RDQVR LLSADMLKHDI2»4KKRV+VITS+A+VK 
Sbjct: 422 MPADQVRSLLSADMLKHDIAMKICAVEVITSTASVK 456 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

45 vaccines or diagnostics. 

Example 2283 

A DNA sequence (GBSx2417) was identified in S.agalactiae <SEQ ID 7039> which encodes the amino 
acid sequence <SEQ ID 7040>. This protein is predicted to be DNA-directed RNA polymerase, subunit 
delta. Analysis of this protein sequence reveals the following: 



15 
20 

25 

30 
35 
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Pinal Results --' 

bacterial cytoplasm CertaintY=0 .2407 (Aff irmativei < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MELEVFAGQEKSELSMIEVKRAILEQRGRDiraWFSDLVNDIQTYLGKSDSAIRESIjPFF 6 0 
M ++ ++ +E E4-++4-E+A + E+ + + F +L+N+1 + LG + + + F 

MGIKQYSQEELKEMALVEIAHBLFEEHKKP--VPFQELIOTIASIiGVKKEELGDRIAQ 58 



Y+DIiN DG F+ L + WGLRSWY D-1-+DEE 



h D +D D E L+ + ++ D+E + + D EI E I DED DED 







Sbj ct : 


1 




61 


Sbjct: 


59 


Query: 


121 


Sbjct: 


112 




181 


Sbjct: 


166 



A related DNA sequence was identified in S.pyogenes <SEQ ID 7041> which encodes the amino acid 
sequence <SEQ ID 7042>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2253 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 162/191 (84%) , Positives = 181/191 (93%) , Gaps = 1/191 (0%) 

Query: 1 MELBVERGQEKSELSMIEVARAILEQRGRDNEMyFSDLVNDIQTYLGKSDSAIRESLPFF 60 

++L+VFAGQEKSELSMIBVARAILE+RGRDNEMyFSDLVN+IQ YLGKSD+ IR +LPFF 
Sbjct: 12 LKLDVERGQEKSELSMIE\CUlAILEERGRDNEMyFSDLVNEI(3raX3KSDaGIRHaLPFF 71 

Query: 61 YSDIOTBGSFIPLGEHKTOLRSWYAIDEIDEEIITLEEDEDGRPKRKKKEVNABMDGDED 120 

Y+DUraK3SFlPLGENKWGLRSWYAIDEIDEEIITLEEDEIX3A KRKKKRVNAEMDGDED 
Sbjct: 72 YTDLJraiGSFIPLGENKWGLRSm'AIDEIDBEIITLEEDEDGaQKRKKKRVNAFMDGDED 131 

Query: 121 AIDYNDDDPEDEDFTEETPSLEYDEENPDDEKSEVESYDSEINEIIPDEDLDEDVEIKEE 180 

AIDY DDDPEDEDFTEE+ +EYDEE+PDDEKSBVESYDSE+NEIIP++D E+V+INEE 
Sbjct: 132 AIDYRDDDPEDEDFTEESRE\7EYDEED!?DDEKSEVESYDSE1MEIIPEDDF-EE\7D1MEE 190 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2284 

A DNA sequence (GBSx2418) was identified in S.agalactiae <SEQ ID 7043> which encodes the amino 
acid sequence <SEQ ID 7044>. This protein is predicted to be CTP synthetase (pyrG). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 23 

»> Seems to have an uncleavable N-tenti signal seq 

INTEGRAL Likelihood = -O.ll Transmembrane 5 - 21 ( 5-21) 



Final Results 

10 bacterial membrane Certainty=0 . 1Q44 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 , 0000 (Not Clear) < suoo 



The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:C3iaO9021 GB:AJ010153 CTP synthetase [Lactococcus lactis subsp. 

cremoris] (ver 2) 
Identities = 421/533 (78%) , Positives = 481/533 (89%) . 



Query: 


2 


Sbact: 






62 


Sbjot: 


63 


Query: 


122 


Sbjct: 


123 




182 


Sbjot: 


183 


Query: 


242 


Sbjct: 


243 




302 


Sbjct: 


303 




362 


Sbjct: 


363 


Query: 


422 


Sbjct: 


4B2 


Sbjct: 


4S3 



TKYIFVTGGWSSIGKGIVaASLGRLLKNRGLKVTIQKFDPYINIDPGTMSPYQHGEVYV 61 
TKYIFVTGG SS+GKGIVAASLGRLI.KNRGI.KVT+QKFDPY+NIDPGTMSPYQHGEV+V 
TKYIFVTGGGTSSMGKGIVAASLGRIjLKNRGLiarr^/OKFDPYUMIDPGTMSPYQHGEVFV 62 



+AAGE+KrK Q++ K LR GIQ NMLV+R+E P 



V+H+YQIPLN+QAQNMDQIVCDHLKL+ P ADM EWSAMVD VMNL+KKVKIALVGKYVE 



LPDAY+SV EALKH+GY +D +D+ WVNA +VTH-+N+ ELVGDA GIIVPGGFGQRG+E 



GKI AI+'i(aRENDVPMLG+CLGMQLT VEFARMVL L OA+S ELDP+T +P+IDIMRDQ 



SGVSPDMRL+E+VEL KKFFVA QYHPELQSRPN EELYT F+ AVEN K 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 7045> which encodes the amino acid 
sequence <SEQ ID 7046>. Analysis of this protein sequence reveals flie following: 



Possible site: 23 
» Seems to have an vaacleavable H-teim signal seq 
INTEGRAL Likelihood = -0.11 Transmembrane 5 - 21 ( 

Pinal Results 

bacterial membrane Certainty=0.1044 (Affirmati\ 
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bacterial outside - 
bacterial cytoplasm - 

The protein has homology with the fpUowing sequences in the databases: 

iGP:caA0902i GB:AJ010153 CTP synthetase [Lactococcus lactls subsp. 

cremoris] (ver 2) 
Identities = 423/532 (79%) , Positives = 483/532 (90%) 

Query: 2 TKYIFVTGGWSSIGKGIVMSLGRl.LKI*mGLKOTIQKET)PYINIDPGTMSPyQHGEVYV 61 

TKYIFVTGG SS+GKGIVAASLGRLLKNRGLKVT+QKFDPY+NIDPGTMSPyQHGEV+V 
Sbjct: 3 TKYIFVTGGGTSSMGKGIVMSLGRLLKNRGLKVTVQKFDPYLNIDPGTMSPyQHGEVFV 62 

Query: 62 TDDGaETDI£lLGHXERFIDlHIiNKySNVTTGKIYSEVLRKERKGEYIiGfl.TVQVIPHITDA 121 

TDIX3affiTDLDIK3HYERFIDINIiNKySNVT+GK+YSE+LRKERKGEyLGAWQ++PH+T^ 
Sbjct: 63 TDDGAETDIJJLGHYERFIDlIMKYSNVTSGKVYSEILRKERKGEYLGATVQlWPHVraM 122 



Sbjct: 123 LKEKIia«AATTTnADIIITEVGGTVGDhffiSLPFIEALRQMKAEVGAni™wiimrt>im^ 182 

Query: 182 KRAGB^m'KPTQHSVKELRGIlGIQPl#ILVIRTEEPVEQGIKNKLaQFCDVNSEaVIESRD 241 

■l-AaGE+RTK Q++ K LR GIQ HMLV+R+E P+ +++K+A FCDV EAVI+S D 
Sbjct: 183 RAAGBLKTKIAO^IATKTLREYGIQa^^MLVIa^SEVPITTEMRDKIflMFaJ^ 242 

Query: 242 VEHLYQIPimQAQSMDQIVCBHLKUffiPQADMTEWSftMVDKVM]^ 301 

VEHLYQIPLNLQaO+MDQIVCDHLKL+AP+flDM EWSAMVD VMNL+K KIALVGKYVE 
Sbjct: 243 VEHLYQIPI^^LQA(SM3QIVa5HLKLnAPKaDMAEWSaMVraV^ml:^KKKOT 302 

Query: 302 LPmYLSVVEALKHSGYaiTOTMDLKWVNRNDVTVI^^ 361 

LPDAY+SV EALKH+GYA+D +D+ WVNaNDVT +N A+L+GDA GIIVPGGFGQRGTE 
Sbjct: 303 LPDAYlSVTEALKHAGYASI»EVDINWV]!ilAND\m3 362 

Query: 362 SKIQAIRYARENDVPMLGICLGMQLTCOTlFARHVIJMGaNSFELEPSTKYPIIDlMEI^ 421 

GKI AI+YARENDVPMLGICLGMQLT VEFAR+VL +EGa+SFEL+P TKYP+IDIMRDQ 
Sbjct: 363 GKIAAIKYARENDVPMLGICLGMQLTAVEFARimjGLEGaHSFEmPETKYPVIDIMRDQ 422 

Query: 422 IDIEDMGGTLRI/SLYPCKLKPGSKAAMAYmQEVVQRRHRHRYEFIilNKFRPEFBAAOFVF 481 

+D+EDMGGTLRLGLYP KLK GS+A AYN+ KWQRRHRHRYBFNNK+R +FE AGFVF 
Sbjct: 423 VDVEDMGGTLRLGLYPAKLKNGSRAKAAYNDAEWQRRHRHRYEFMNKYREDFEKAGFVF 482 

Query: 482 SGVSPDHRLVEIVELKEKKFFVAAQYHPELQSRPMRPEEIjYTAFVTAAIKNS 533 

SGVSPDNRLVEIVEL KKFFVA QYHPELQSRPNRPEELYT F+ A++NS 
Sbjct; 483 SGVSPDNRLVEIVELSGKKFFVACQYHPELQSRPNRPEELYTEFIRVAVENS 534 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 477/532 (89%) , Positives = 503/532 (93%) 

Query: 1 MTKYIFVTGGWSSIGKGIVAaSLGRLLKNRGLKVTIQKFDPYINIDPGTMSPYQHGEVY 60 

MTKYIFVTGGWSSIGKGIVAASLGRLLKNRGLKVTIQKFDPYINIDPGTMSPyQHGEVY 
Sbjct: 1 MTKYIFVTGGWSSIGKGlVAASLGRIiLKNRGLKVTIQKFDPYIHIDPGTMSPYQHGEVY 60 

Query: 61 VTDDGAETDLDLGHYERFIDIIMKYSNVTTGKIYSEVLKKERRGEYICaTVQVIPHVTD 120 

VTDDGAETDLDLGHYERFIDINIJIKYSNVTTGKIYSEVIri-KER+GEYLGATVQVIPH+TD 
Sbjct: 61 VTIXKSAETDIiDLGHYERFIDiraMKYSimTGKIYSBVIiRKERKBEY^ 120 

Query: 121 ALKEKIKRAATTTDSDVIITEVGGTOaDIESLPFLEAIBQMKADVGSDNVMyiHTTLLPY 180 

ALKEKIKRAA+TTDSDVIITEVGGTVGDIBSLPFLEALRQMKADVGS+IWMYIHTTLLPY 
Sbjct: 121 RLKEKIKRAASTTDSDVIITEVGGTVia31ESLPFLERIiRQ|yiKA0VGSENVMYIHTTIiPY 180 

Query: 181 LKaAGEMKTKPTQHSVKEIJlGL6IQPNMLVIRTEQPAaQSIKHKLAQFCD\MEA^ 240 

LKaAGEMKTKPTQHSVKELRGLGIQPNMLVIRTE+P Q IKNKLAQFCDV BAVIES 
Sbjct: 181 LKaAGEMKTKPTQHSVKElJIGlXSXQPlMiVIRTEEPVEQGIKinCIAQFCDVNSEAVIESR 240 



65 Query: 241 DVDHlYQIP»1QAQNra)QIVaDHI.K]^PAaDMi™SAMVD 300 

0V+H+YQIPM+QAQ+MDQIVCDHLKL P ADMTEWSRMVDKVMtIL K KIALVGKYV 
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Sbjct: 241 DVEHLYQlPMLQAQSMDQIYCDHLKLi^APCRDMTEVISMvrroiOTMMLR^^^ 300 

Query: 301 ELPDAYLSVVESJ^KHSGYVNDVAIDLKWVNaaEVTEDNIKELVGIfflDGIIVPGGFGQRGS 3G0 

ELPDAYLSWEALKHSGY ND AIDliKWraa +VT DN +I<+GDflDGIIVPGGFGQRG+ 
Sbjct: 301 ELPmYLSVVEALKHSGYMDTAIDLKlTOBOTmVDNAADLIiGIWJGIIVPG^ 350 

Query: 361 EGKIEaiRYARENDVPMLGVOJGMQLTCWEFARNVLlSn^ 420 

EGKI+AIRYARENDVPMLG+CLGMQLTCVEFAR+VIJSr+ GANS EL+P T +PIIDIMRD 
Sbjct: 361 EGKIQAIRYARENDVPmGICI^QLTCTEFAEHVIJaMEGANSFELEPSTKYPIlDIMRD 420 

Query: 421 QIDIEDMGGTIiRLGIiYPCKLKSGSRAAAAYiraQEWQRRHRHRYEEISPrKFREQFEflAGFV 480 

QIDIEDMGGTLRI.GLYPCKLK GS+AA AYNNQEWQRRHRHRYEFN KFR +FEAASFV 
Sbjct: 421 QIDIEDMGGTIiRLGDYPCKLKPGSKAAMRYNNQEWQRRHRHRYEFNNKFRPEFEAAGFV 480 

Query; 481 FSGVSPDNRIJffiWElLPEKKFFVaAQYHPELQSRENHAEEriYTAFVTAAVEN 532 

FSGVSEDHRL+E+VEL EKKFFVAAQYHPELQSRPN EELYTAFVTA&++N 
Sbjct: 481 FSGVSPDNELVEIVEDQKEKKFFVARQXHPELQSRPNRPEELYTAFVTAAIBOI 532 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2285 

A DNA sequence (GBSx2419) was identified in S.agalactiae <SEQ ID 7047> which encodes the amino 
acid sequence <SEQ ID 7048>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL iiikelihood = -9.92 Transmembrane 13 - 29 ( , 3 - 34) 



Final Results 

bacterial membrane — Certaintyi=0. 4970 (Affirmative) < suco 
30 bacterial outside -— Certaintyi=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9285> which encodes amino acid sequence <SEQ ID 9286> 
was also identified. 

35 The protein has homology with the following sequences in the GENPEPT database. 

-- 8/289 (2%) 

MKKIRLSKFIKMIWILFLISVAASFYFFHVAQVRDDKSFISNGQEKPGNSLYAYDKSFD 60 
MKKI L+ I +V + I + S + + D+ I 4 G+ ++ +SF+ 

MKKILLA- -IGALVTAVIAlGrVFSHMILFlKKOT)ED- -IIKRETDNGHDVF- - -ESFE 53 

KTJiKQKIEMTNQMIKQVAWYVPAVKKTHKTAVVVHGFANSKENMKaYGWIjFHIM 120 
++K ++ +YA TT++HG + N YLF LG+NVI1+ 



D+ HG+S G+ yG+ +++++ K ++ +K N I + G SMG T ++ +G 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


54 




121 


Sb j ct : 


114 




130 


Sbjct: 


174 




240 


Sbjct: 


234 



Y LP++PLL 



P LFIH D+++P S Y+ G K LYI 4 
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A related DNA sequence was identified in S.pyogenes <SEQ JD 7049> which encodes the amino acid 
sequence <SEQ ID 7050>. Analysis of this protein sequence reveals the following: 



Possible site: 24 
Seems to have an uncleavable N-term signal seq 
Likelihood = -7.48 Transmembrane 



26 ( 3 - 

- Final Results 

bacterial membrane Certainty=0. 3994 (Affirmative) . 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with tiie following sequences in the databases: 

>GP:CaB14296 GB:Z99116 yqkD [Bacillus subtilis] 
Identities = 88/295 (29%) , Positives = 145/295 (48%) , Gaps = 4/ 



LGILFLLITLISVGASFYFFHVaQIREEKSFItraKKRSTiraPLyPAEQSFDALPYEKRQL 6 9 
L I L+ +I++G F, H+ ++K+ + KR T+N + +SF+ + + 
LAIGAIiVTAVIAIG — IVFSHMILFIKKKTDEDIIKRETnNG-HDVFESFEQMEKTAFVI 62 



f G SMGA T ++ +G 



Query: 


10 


Sbjct: 


6 


Query: 


70 


Sbjct: 


63 




130 


Sbjct: 


123 






Sbjct: 


183 




249 


Sbjct: 


243 



DC +A ++L + 



FIH DD++P 



+A Y riP++PLL 



+Ha S+ N Y+K -t 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 203/294 (69%) , Positives = 246/294 (83%) 



MKKIRIiSJCFIKMIWILFLISVRJiSFYFFHVjyaVRDDKSFISiraQRKPGNSIiYAYDKSro 6 0 
MK IR++K++ ++ +++ LISV ASFYFFHVAQ+R++KSFI+1I +R N LY -H+SFD 

MKTIRIAKYLGILFLLITLISVGASPYFPHVAQIREEKSFINNKKRSTNNPLYPAEQSFD 60 



Query: 




Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 






181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 



P+QV ++IEDC6y+SVWDELKFQAK MY LPAFPLLYEVS +SKIRaGFSYG+ASSV+QL 



P LFIHGDKD+FVPT MVYDNYKAT G KE+ IVKGAKHAKSFET PE-I- 



SEQ ID 9286 (GBS662) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown m Figure 136 (lane 8-10; MW 63kDa) and in Figure 187 (lane 4; MW 63kDa). 
GBS662-GST was purified as shown in Figure 237, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2286 

A DNA sequence (GBSx2420) was identified in S.agalacttae <SEQ ID 7051> which encodes the amino 
5 acid sequence <SEQ ID 7052>. This protein is predicted to be aspartate-ammoma ligase (asnA). Analysis 
of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

10 Pinal Results 

bacterial cytoplasm Certainty=0 .2898 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — CertaintY=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9309> which encodes amino acid sequence <SEQ ID 93 10> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22222 GB:U32738 aspartate- -ammonia ligase (asnA) [Haemophilus influenzae Rd] 
Identities = 24S/3G0 (82%) , Positives = 258/300 (89%) 

20 

Query: 1 MIDKLEIVEVQGPIIiSQVGDGMQDrajSGIEHPVSVKVmiPEaEFEOTHSIiMWKRHT^ 60 

+I++L I+EVQGPILSQVO+GMQDNLSGIE V V V IP A FEWHSLAKWKRHTrA 
Sbjct: 23 BIEQLGIIEVQGPILSQVCM3»K3DNLSGIEKRVQ\mKCIPiaVFE\AmSIJUOn^^ 82 

25 Query: 61 RFGETOGEXSLFVHMKALRPDEESLDPTHSVYVIXSMDWEKVIPDGRRNLD^ 120 

RF F E BGLFVHMKALRPDEDSLDPTHSVYVDQWDMEKVIP+GRRN YLKETV IYh- 
Sbjct: 83 RENFKEDEXSLFVHMKALRPDEDSLDPTHSVYVDQWDVfflKVIPEGRRNFAYLKETVNSITO 142 

Query: 121 AIRLTELAVEaEFDIESILPKRITFIHTBELVEKYPDLSPKERENAIAKEYGAVFLIGIG 180 
30 AIRLTELAVEAREDI SILPK+ITF+H+E+I1V++YPDLS KERENRI KEYGAVFLIGIG 

Sbjct: 143 AIRLTELAVEftRFDIPSir.PKQITFVHSEDLVKRYPDI>SSKERKNAICKEYGAVFLIGIG 202 

Query: 181 GEIJU3GKPHDGRAPDTODWTTPSENGFK!GIiNGDILVWNEQLGTAFEL3SMGIRVDEnaLK 240 
G+L+DGKPHDGRAPDYDDMTT SBNG+KGLNGDILVWN+QLG AFEIiSSMGIRVDE Alri- 

Query: 241 RQVVLTGDEDRLBFEWHKTLIiRGFFPLTIGGGIGQSRLaMFLIjRKKHIGEVQSSVWPKEV 300 

QV LTGDED L+ +VIH+ LL G PLTIGGGIGQSRLAM LtRK HIGEVQSSWPKE+ 
Sbjct: 263 LQVGLTGDEDHLKmWHQDIJ^KLPLTIGGGIGQSRLaMLUJlKKHIGEVQSSVWP™ 322 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7053> which encodes the amino acid 

sequence <SEQ ID 7054>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
>>> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -0.16 Transmembrane 189 - 205 ( 189 - 205) 

Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=c . 0000 (Not Clear) < suco 

The protein has homology with the foUowmg sequences in the databases: 

>GP:AAC22222 GB:'032738 aspartate— ammonia ligase (asnA) [Haemophilus influenzae Rd] 
Identities = 255/330 (77%) , Positives = 289/330 (87%) 

55 

Query: 1 MKKSFIHQQEEISFVKNTFTQYLIAKLDVVEVQGPILSRVGDGMQDNLSGTEIIPVSVNVL 60 
MKK+FI QQ+EISFVKNTFTQ til +L ++EV(3GPILS+VG+GMQDNLSG E V VNV 



wo 02/34771 



PCT/GBOl/04789 



Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct- 


181 


Query: 


241 


Sbjct: 


241 




301 


Sbjct: 





MKKTFILQQQEISFVKNTFTQ^Stt)IEQLGIIEVQGPILSQVGNG^«D]!^JSGIEKR.VQVNVK 6 0 



KVIP+S+RN AYIiKETV +IY+ IRLTELAVEflR+DI ++IjPK+ITF+H+E4-LV +YPDL 



TPKERENAITKEKaWLIGIGGVLPIX3KPHDGRAPDYDDWriKi'KrK3YHG]M3DILV™ 240 
+ KEREMAI KE+GAVFLIGIGG L DGKPHDGRAPDYDDWTIE+ENGy GliNGDILVWIJ 

)ILVVJN 240 



DQLQ AFELSSMSIRVDE AL+ QV H-TGD+D L DWH+ UiNG PLTIGGGIGQSR+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 254/303 (83%) , Positives = 280/303 (91%) 

Query: 1 MIDKLEIVEVQGPILSQVGDGMQDNLSGIEEPVSVKVLNIPEflEPEWHSLAKWKRHTLA SO 

+I KL++VEVQGPILS+VGDaMQDNLSG E+EVSV VL IP A FEWHSIAKWKRHTLA 
Sbjct: 23 MMOiDVVBVQGPILSRVGDGWl&SGrENPVSVimjKIPmTFEVVHSLf^ 82 

Query: 61 RFGE^EGEGLFVHMKaLRPDEDSLDPTHSVYVDQWDWEKVIEDGRIa^J)YLKETVEKIyK 120 

RFGFNEGEGL V+MK2U^PDEDSI£I THSVYVDQWDWEKVIPDG+ENL YLKETVE lYK 
Sbjct: 83 RPGFNEGEGLVVl#IKALRPDEDSLDQTHSVYVDQWDWKKVIHKKE]SnAYI.KET\7ETIYK 142 

Query: 121 MRMEIAVEARFDIESILPKRITFIHTEELVEKYPDLSPKERENAIAKEYGRVFLIGIG IBO 

IRLTEIAVEaR+DIE++LPK+ITFIHTEELV KYPDL+PKERENAI KE+GRVFLIGIG 
Sbjct: 143 VIRLTELa.VEaRYDIEaVLPKKITFIHTEELV2UCYPDi:.TPKERENAITKEFGRVFLIGIG 202 

Query: 181 GEMXSKPHDGRAPDYDDm'TPSENGFKGiaTGDILVWNEQIXSTAFELSSMGIRVDED^ 240 

G L DGKPHDGRAPDYDDWTT +ENG+ GLNGDILVWN+QLG+AFELSSMGIRVDK+ALK 
Sbjct: 203 GVLPDGKPHDGRAPDYDDWTTETHNGYHGLNGDZLVWNDQLGSAFELSSMGIRVDEESJjK 262 

Query: 241 RQWLTGDEDRLEFEWHKTLLRGFFPLTIGGGIGQSRLAMFLLRKXHIGEVQSSVWPKEV 300 

RQV +TGD+DRL F+WHK+LL G FPLTIGGGIGQSR+ MFLLRK HIGEVQ+SVWP+EV 
Sbjct: 263 RQVBTmSDQDRLGFIOTKSIiIiNGLETLTlGGGIGQSRMVMFLLREQHIGEVQTSVWPQEV 322 

Query: 301 RDI 303 
RD+ 

Sbjct: 323 RDS 325 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vacdnes or diagnostics. 

Example 2287 

A DNA sequence (GBSx2421) was identified in S.agalactiae <SEQ ID 7055> which encodes the amino 
acid sequence <SEQ ID 7056>. Analysis of this protein sequence reveals the following: 

o N-tertninal signal sequence 

- Final Results 

bacterial cytqplasm Certainty=0. 3163 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

S Example 2288 

A DNA sequence (GBSx2422) was identified in S.agalactiae <SEQ ID 7057> which encodes the amino 
acid sequence <SEQ ID 7058>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to havs a cleavable N-term signal seg. 

10 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainti^O . 0000 (Not Clear) < suco 

bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

15 

A related GBS nucleic acid sequence <SEQ ID 9007> which encodes amino acid sequence <SEQ ID 9008> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaD5662a GB:AF165218 Bta [Streptococcus pneumoniae] 
20 Identities = 30/97 (30%) , Positives = 50/97 (50%) , Gaps = 3/97 (3%) 

Query: 50 KALVSKSQQSEATIFIGRPTCQYCRAFLPKLLKS(3ATLHSKiyYLDSQKYKG-KRLKSFF 108 

+A + ++ AT FIGR TC YCR F L A + iy+++S++ I1++F 
Sbjct: 18 RAQEA]J3KKETATFFIGRKTCPYCRKFAGTLSGVVAETKAHIYFINSEEASQIiNDIiQRER 77 

25 

Query: 109 KKHHITTVPNLAHYQQGKMTKYLVQGSQATPQQIQTF 145 

++ I TVP H G++ + S + Q+I+ F 
Sbjct: 78 SRYGIPTVPGFVHITDGQIN--VRCDSSMSAQEIKDF 112 

30 SEQ ID 9008 (GBS 134) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 40 (lane 2; MW ITkDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 4; MW 42kDa). 

GBS134-GST was purified as shown in Figure 204, lane 10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
35 vaccines or diagnostics. 

Example 2289 

A DNA sequence (GBSx2423) was identified in S.agalactiae <SEQ ID 7059> which encodes the amino 
acid sequence <SEQ ID 7060>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
40 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9603> which encodes amino acid sequence <SEQ ID 9604> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

i.GP:BM06309 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 78/178 (43%) , Positives = 115/178 (63%) , Gaps = 3/178 (1%) 

Query: 3 MRWAGTFGGRPLKTLDGKTTRPTTDKVKGaiFmiGPFFEGGRVLDLFSGSGSLAIEAI 62 

MRV+AG G UC + G TRPTTDKVK AIEIJMIGPPF+GG LDL+ GSG L IEA+ 
Sbjct: 1 MRViaQEQKGLTIiKAVPGHKTRPTTDKVKEAIESMIGPFEDGQIGLDlirG^^ 60 

Query: 63 SRa^mQAVLVEKDRI»QWIQE^IIJWKSPEQFQLLK^m^IRaI.EQLTGQ---PDLV^ 119 

SRG+++ + V++ +RA I++N-H+ + ++ + +A RaL+ LT + P V LD 

Sbjct: 61 SRGVEimiFVDQQKRaiETIRQPSnJSHCXSLEGRaEVYiaC^ 120 

Query: 

PPYAK+ 1 + 1+ + GLL H 

Sbjct: 121 PPYAKQTIKNDIAILANHGLLE: 

There is also homology to SEQ ID 132. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2290 

A DNA sequence (GBSx2424) was identified in Ragalactiae <SEQ ID 7061> which encodes the amino 
acid sequence <SEQ ID 7062>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4984 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

3 pneumoniae bacteriophage MMl] 



1+ +IL ++B QRAYIi GaPIiH- G++R+P+S6KYQLEI SVYLDHAQ +A+L+++F+LnA 



KV+E K GAVTYLQ+AEDIMDFLIVI AM+ARD FE +K++RETRND+NRAHN ETANIA 



RT++ASMKTINNI KI D' +G + LP DL++VAQ+R+ HPDYSIQQ+ADSL TPL+KSGV 



Query: 




Sbjct: 


50 




62 


Sbjct: 


110 




122 


Sbj ct: 


170 




182 


Sbjct: 


230 




242 


Sbjct: 


290 



NHRIiRKINKIADEL 



There is also homology to SEQ ID 5540: 

Identities = 186/254 (73%) , Positives = 227/254 (89%) 



Query: 2 tBRHIYSmEEHXHLQPEIKYHQKXmRKimVYTVFIEEKVDVirADLKr^^ 61 
+ R+IYS++E+ + PEI+YHQKrNLRKNRVYTV++E+ V+ ILADLKLAD+FFG+ETG 
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Sbjct: 50 IfiRyIYSLlEnRYVIVPEIRyHQKTOT,RK^KVYTVyVEQGVETIIlZU3LK^ 109 

Query: 62 lEHSILDNDENGIAYLRGAFLSTGTVREPDSGKYQLEIFSVYIiDHaQDIAHMKK^^ 121 

IE +L +D GR+YL+GAFL+ G+H-R+P+SGICi-QLEI+S"iArLDHAQDIi& LM+KFMLDA 
Sbjct: 110 IEPQVLSDDNAGRSYLKGAFLAAGS:RDPESGICYQLEIYSVYLDHAQDIAQLMQKFMLDA 169 

Query: 122 KVIEHKHGATCYLQKAEDIMDFLIVIDAMEARDAFEEIKMIRETKraDIWRaHNVETflNIA 181 

K lEHK GAVTYLQKAEDIMDFLI + I AM ++ FE IK4-+RE RICDIWRAMH ETANIA 
Sbjct: 170 KTIEHKSGAVTYLQKAEDIMDFLII-GAMSCKEDFEAIKLLREARNDIflRAIWAETANIA 229 

Query: 182 RTITASMKTIMNIIKIMDTIGFDALPSDIiRQWVQVRVAHPDYSIQQIADSLETPLSKSGV 241 

+TI+ftSMKTllINIIKIMDTIG ++LP +L+CWAQ+RV HPDYSIQQ+AD+LE P++KSGV 
Sbjct: 230 KTISaSMKTimilKIMDTIGLESLPIELQQVaQLRVKHPDYSIQQVAnaLEFPITKSGV 289 

Query: 242 NHRIiRKINKIADEL 255 

NHRLRKINKIAD+L 
Sbjct: 290 NHRLRKINKIADDL 303 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2291 

A DNA sequence (GBSx2425) was identified in S.agalactiae <SEQ ID 7063> which encodes the amino 
acid sequence <SEQ ID 7064>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.0297(Affirt[iative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogems. 

Based on this anal}^is, it was predicted that this protein and its epitopes, could be useiul antigens for 
vaccines or diagnostics. 

Example 2292 

A DNA sequence (GBSx2428) was identified in S.agalactiae <SEQ ID 7065> which encodes the amino 
acid sequence <SEQ ID 7066>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0 .2706 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54571 GB:AJ006393 response regulator [Streptococcus pneumoniae] 
Identities = 139/190 (73%), Positives = 166/190 (87%) 

Query: 8 IKIVLVDDHEMTOLGLKSFIlNIK3ADVEVlGEASNGLEGIKKADEIlRPDVVV^ro 67 

+KI+LVDDHEMVRLGLKS+ +LQ DVEV+GEASNG +GI ALELRPDV+VMD+VMPEM4- 
Sbjct: 1 ^KILLVDDHE^(n^^I^IJKSyFDMDDVEVVGEASNGSQGIDIALEIJlPDVIVlyIDlV^^ 60 
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Sbj Ct : 


68 
61 


5 


Query: 


128 




Sbjct: 


121 






188 




Sbjct: 


181 




There is al 


Isohi 


15 


Identities 






5 




Sbjct: 


3 


20 


Query: 


65 




Sbjct: 


63 


25 


Sbj ct : 


125 

123 






185 


30 


Sbjct: 

Based on 


183 

this i 



5 LTARERD+L L+AEGY+NQRIflD+LFISLKTVKTHV 



: 158/198 (79%) , Positives = 176/198 (88%) , Gaps = 1/19B (0%) 



E+ GVERTL +IiK W BA +LVLTSYIiDHBKITOVI+fiG&KGYMLKTSSAaEIIMAIRKV 



3 KVDKKIKfiHD+ P LHE LTARE DlL+IiIAKSYDNQ ISDEIiFISIiKrVK 



THVSNIL KIi G R 



vaccines or diagnostics. 
Example 2293 

A DNA sequence (GBSx2429) was identified in S.agalactiae <SEQ ID 7067> which encodes the amino 
acid sequence <SEQ ID 7068>. This protein is predicted to be histidine kinase (narQ). Analysis of this 
protein sequence reveals the following: 

3 N-tenninal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 .3944 (Affirmative) suco 
bacterial membrane --- Certainty^O. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query; 1 MIDNGIGFDMDSVYDDSYGLKNIEDRVEDLAGNLQLriSQPGKGVftMDIRLPLVNQ 55 

++DNGIGF + S+ DLSYGL+1II++RVED+AG +QI.L+ P +G+A+DIR+PL+++ 
Sbjct: 276 VVDNGIGFQLGSLDDLSYGLENIKERVEDMAGTVQLLTAPKQGUWDIRIPLLDK 330 

There is also homology to SEQ ID 2992: 

Identities = 44/59 (74%) , Positives = 51/59 (85%) 

Query: 1 MIDNC3IGFDM3SVYDLSYGIiKNIEDRVEDLAGNLQI.LSQPGKBVaMDlRI.PLVNQSEDK 59 

MID+G+GFDMD V DtBYGLKHIEDRV DLAGNL L+SQ GK3V+MDIRLP+V +D+ 
Sbjct: 276 MIDIX3VGFDM3QVRDLSYGLKHIEDRVl©iyiGNimiSQKGK3VSMDlRLPIVKm 334 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 

Example 2294 

5 A DNA sequence (GBSx2430) was identified in S.agalactiae <SEQ ID 7069> which encodes the amino 
acid sequence <SEQ ID 7070>. This protein is predicted to be RfbQRS0155-l. Analysis of this protein 
sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0.1120(Mfirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty^O. 0000 (Not Clear) < suco 

15 

There is also homology to SEQ ID 7072: 

Identities = 171/172 (99%) , Positives = 172/172 (99%) 

Query: 1 MGQmVEEKSNEIVAIPQmRTIDIRKBIVTirmGTQTAIVDTIIKGKADYCLAVItmQ 60 
20 +GIQVRVEEKSNEIVAIPQIjaiTIDIRK3IVTIimGlW'AIVDTIIKGKaDYCIAVKt3NQ 

Sbjct: 143 IK3QVaVEEKSNEIVAIPQLLRTIDIRKSIOTIIffiMaTQTAIVDTIIKGKa0yCIJ^^ 202 

Query: 61 ETLTODIALYFSDVOTjLEEUJENAQT^QTVEKSRGQIEVREYWSSDIKMAJtnffiK^ 120 
BTLVDDIALYFSDVNLLEEiLQEimQYYQTVEKSRCSQIEVREVWSSDIKMLCQNHPKWHK 
25 Sbjct: 203 ETLYDDIALYFSDVNLLEELQENAQYYQTVEKSRGQIEVRETOVSSDIKSttCCSNHPIOT 262 

Query: 121 LRGIG^mOTIDKI)GQLSQENRYFIFSFKPDVLTFA^ICVRGHWQIESMHWLL 172 

IiRGIGOTRimiDKDGQLSQEimYFIFSFKHJVIjTFaNCVRGHWQIESMHWLL 
Sbjct: 263 LRGIGMimriDKDGQLSQEiniYFIFSFKPDVLTFflNCVRGHWQIESMHWIiL 314 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefial antigens for 
vaccines or diagnostics. 

Example 2295 

A DNA sequence (GBSx2431) was identified in S.agalactiae <SEQ ID 7073> which encodes the amino 
35 acid sequence <SEQ ID 7074>. This protein is predicted to be translation initiation factor if-3 homolog dsg 
(infC). Analysis of this protein sequence reveals the following: 
Possible site: 42 

»> Seems to have no N-terrainal signal sequence . 

40 Final Results 

bacterial cytc^lasm Certainty=0 . 17B7 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaA68920 GB:Y07640 translation initiation factor, IF3 [Listeria monocytogenes] 
Identities = 112/169 (65%), Positives = 134/169 (79%) 

Query: 7 KDLFIKDEIRVREVRLVGLEGEQLGIKPLSEAQAIADDANVDLVLIQPQATPPVAKIMDY 65 
50 KD+ +ND IR REVRI,+ +GEQLG+K +A ZA+ AN+DLVL+ P A PPVA+IMDY 

Sbjct; 3 KDMLVNDGIRAREVRLIDQDGEQLGVKSKIDALQIAEKANLDLVLVAPTAKPPVARIMDY 62 

Query: 67 GKFKFEYQKKQKEQRKKQSVVTVKEVRLSPVIDKGDFETmoSIGRKPLEKGN^^ 126 
GKF+FE QKK KE RK Q V+ +KEVKLSP ID+ DP4-TKtRN RKPLEKG+KVK SIRF 
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Sbjct: 63 GKFRFEQQKICDKEARKNQKAriVMKETOLSPTIDEHDFDTKLRmRKFLEKGDKVKCSIRF 122 

Query: 127 KGRMITfKEIGAKOTAKFAEATQDIAlIEQRAKMDGRQMFMQLAPIPDK 175 

EGR ITHKEIG KVL FA+A +D+ lEQR KMDGR MF+ LaP+ +K 
Sbjct: 123 KGR&ITHKEIGQKOTjDRFAKACEDLCriEmPKMiraRSMFLVIAPLHEK 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7075> which encodes the amino acid 
sequence <SEQ ID 7076>. Analysis of this protein sequence reveals the following: 



- Pinal Results 

bacterial cytoplasm — Certaiiityi=0 . 2247 {Affirmative) ■= succs 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — CertaintytoO. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 167/176 (94%), Positives = 173/176.(97%) 

(Juery: 1 MKIIAKKDIiFIKDEIRWEVRLVGLESEQLGIKPLSEAQAIADiaNVDLVI.IQPQATPP^ 60 

+KIIAKKDLFIHDEIRVREVRLVGI.EGEQi:iGIKPLSEAQ++RD +NVDLVIiIQP(2A PPV 
Sbjct: 1 VKIIAKKDLFIMDEIRVREVRLVGLESEQLGIKPLSEaQSLADASNVDLVlIQPQAVPPV 60 

Query: 61 AKIMDYGKFKFEYQKKQKEQRKKQSVVTVKEVRLSPVIDKGDFETKLRNGRKFLEKGNKV 120 

AK+MDYGKFKFEYQKKQKEQRKKQSWTVKEVRLSPVIDKGDFETKLRNGRKFLEKC3NKV 
Sbjct: 61 AKLMDYGKFKFEYQKKQKEQRKKQSWTVKEVRLSPVIDKGDFETKLRNGRKFLEKGNKV 120 

Qae-ry: 121 KVSIRFKGRMITHKEIGAKVLAEFAEATQDIAIIEQRAKMDGRQMFMQLAPIPDKK 176 

KVSIRFKGRMITHKEIGAKVLA+PAEATQDIAI lEQRAKMDGRQMFMQLAPI DKK 
Sbjct: 121 KVSIRFKGRMITHKEIGAKVLADPAEATQDIAIIEQRAKMDGRQMFMQLAPISDKK 176 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2296 

A DNA sequence (GBSx2432) was identified in S.agalacttae <SEQ ID 7077> which encodes the amino 
acid sequence <SEQ ID 7078>. Analysis of this protein sequence reveials the following: 

Possible site: 57 

>» Seems to have no N-terrainal signal sequence 



40 Pinal Results 

bacterial cytoplasm — Certainty^rO. 1807 (Affirmative) < suco 
bacterial itieinbrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MAAKWKAGVEEVXIRSVFTCNTRHGVCRHCYGINLATGDAVEVGEAVGTIAAQSIGEPG 60 

MA +W AGV EV IRSV TCNTRHGVCRHCYGINLATGDAVEVGEAVGTIAAQSIGEPG 
Sbjct: 122 MftRQVVNAGVTEVTIRSVLTCNTRHSVCEHCYGINIi&TGnAVBVGEAVGTIAAQSIGEPG 181 

Query: 61 TQLTORTFHTGGWASNTDITQGLPRIQEIFBARNPKSEAVITBVKGEVVAIBEDaSTRTK 120 

TQLTMRTFHTGGWAS++D1TQGLPR+QBIFBARNPKGEAV1TEVKGEV AIEED+STRTK 
Sbjct: 182 TQLTMRTFHTGGWASSSDITQGLPRVQEIFEARNPKGEAVITEVKGEVTAIEEDaSTRTK 241 

Query: 121 KWVKGQTGEGEYVVPFTfiRMKVEVGDEVARGRRLTEGSIQPKRLIiEVRDTLEiTOTYIi^ 180 

KVFVKGQTGEGEYVTOFT3fflMKVEVeD+V+RGaUU,TEC3SIQPK LL VRD LSVETYLLA 
Sbjct: 242 KVFVKGQTGE6EYVVPFTAEMKVEVGIX3VSRGaALTEGSIQPKHIJ»VRDVLSVETYIiLA 301 
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■ Query: 181 EVQKVYRSQC3VEIGDKHVKV^IVRQIymRKVRWlDPGDTDLIlPGTLlym 240 
EVQKVyRSQGVEIGDKH+EVMVRQM+RKVRVMDPGDTDLL GTLMDI+DFTDJU)J+D+VIS 
Sbjct: 302 EVQKOTRSQGVEIGDKHIEV^IVRQMIRKVRV^C)PGDTDLIJ^GTL^mITDFTDiai^?DVVIS 361 

5 

Query: 241 GGIPATSRPVLMGITKASLETNSFLSAASFQETTRVLTDAAIRGKK 286 

GG+PAT+RPVLMGITKASLETNSFLSAASFQBTTRVLTDAAIRGKK 
Sbjct: 362 GGVPATARPVLMGITKASLETWSFLSAASFQETTRVLTDAAIRGKK 407 

1 0 There is also homology to SEQ ID 384. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2297 

A DNA sequence (GBSx2434) was identified in S.agalactiae <SEQ ID 7079> which encodes the amino 
15 acid sequence <SEQ ID 7080>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-tertninal signal sequence 

Final Results 

20 bacterial cytoplasm — Certaiiity=0.0352(2iffinnative) < suco 

bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogems. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2298 

A DNA sequence (GBSx2435) was identified in S.agalactiae <SEQ ID 7081> which encodes the amino 
30 acid sequence <SEQ ID 7082>. This protein is predicted to be acetoin dehydrogenase (TPP-dependent) 
beta chain (pdhB). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

35 Final Results 

bacterial cytoplasm — Certainty=0 . 0266 (Affirmative) < suco 
bacterial membrane — Certainty^O. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>QP:BaB04496 GB:APC01509 acetoin dehydrogenase (TPP-dependent) beta 
chain [Bacillus halodurans] 
Identities = 37/57 (64%), Positives = 50/57 (86%) 

45 Query: 1 IttlEEFGaKRVRDTPISBftaIAGSAIGaaQTGIlRPIVDLTF^mEV^IflMDAIVDDCIR 57 

M+EEFG++RVR+TPISE3iAI+G+AIQAA TG+RPI++L F DF+TIAMD +V+ + 
Sbjct: 44 MIEEFGSERVEBTPISEaaiSGTAIGaaLTGMRPlLELQFSDPITIAMDSMVNQAaK 100 



There is also homology to SEQ ID 4272. 
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Based on this analysis, it was. predicted that this protein and its epitopes, co\dd be useftil antigens for 

vaccines o 



Example 2299 

A DNA sequence (GBSx2436) was identified in S.agalactiae <SEQ ID 7083> which encodes the amino 
acid sequence <SBQ ID 7084>. This protein is predicted to be Structural protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 



10 Final Results 

bacterial cytoplasm — Certainty^O. 3015 (Affirmative) < succ 

bacterial membrane Certainty!=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 IKACmiPKPELVTEIMSKVKGHSTriBKLSGQTPIPENGVEQEVFIJIiDGlIAQIVGEGEQKL 64 

■1- GILF P LVT4-++SKV G S++R+LS Q PIPFNG + F F +D +V E +K 
Sbjot: 3 LNKCTLFDPTIiVTDLISKWaGKBSiaiOiSJiQKPIPPNGEaCSreTFTim 62 

Query: 65 GtWiUOTTSKIIKPIjKPVYQRRMTDEFKXASEEKRIiNFLKHYWXSFAKKMaE^ 124 

+ + + P+K Y AR++DEF YAS+E+++N L+ + DGFAKK+A D+ A HG 
Sbjct: 63 HGGVTIAPQTMVPIKVEYGaRISDEFMyASDEEKINILQEE^TCFAKKVaRGIDLMAFHG 122 

Query: 125 liEPRTMT 131 

Sbjct: 123 VNPRLGT 129 

Based on this anals^is, it was predicted that iJiis protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2300 

A DNA sequence (GBSx2439) was identified in S.agalactiae <SEQ ID 7085> which encodes the amino 
acid sequence <SEQ ID 7086>. This protein is predicted to be surface protein Rib. Analysis of this protem 
sequence reveals the following: 
Possible site: 24 

»> Seems to have no N-terminal signal sequence 



40 Final Results 

bacterial cytoplasm Certainty=0. 1892 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000. (Not Clear) < suco 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2301 

A DNA sequence (GBSx2440) was identified in S.agalactiae <SEQ ID 7087> which encodes the amino 
50 acid sequence <SEQ ID 7088>. Analj^is of this protein sequence reveals the foUovidng: 
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5 N-terminal signal setjuence 

Final Results 

bacterial cytoplasm — - Certainty=0. 2227 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — CertaintY=0 . 0000 (Not Clear) < suco 

Based on this atialysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2302 

A DNA sequence (GBSx2441) was identified in S.agalactiae <SBQ E) 7089> which encodes the amino 
acid sequence <SEQ ID 7090>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 

no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=C. 2 94 8 (Affirmative) < suco 

20 bacterial membrane CertaintY=Q . QOOQ (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9319> which encodes amino acid sequence <SEQ ID 9320> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96616 GB:AJ400629 integrase [Streptococcus pneumoniae 
bacteriophage MMl] 
Identities = 84/238 (35%) , Positives = 137/238 (57%) , Gaps = 8/238 (3%) 

MTIiDKNSSQRQKKAGIiILQEKIEDRLMEiraSEMTXGELKKEYLKQMIPTVKDSTro^ 6 0 
+T++K + QA+ +A ++LQEKI +L+ + +T+ E+ + K W TVK+STK 

ilTFEEIYNLFyKSWRQTVKESTKHNCK 89 





1 


Sbjot: 


30 


Query: 


61 


Sbjct: 


90 


Query: 




Sbjct: 


150 




175 


Sbjct: 


208 



V+P DTI+ L +R ++ I+K+++ NY K R RL IF+YA+Q y+ 



TG+RYGEL+ L IDFEN +11 +D + TKrSRI VS++++-1- ■)- 
TGMRYGELTALQLKNIDFENNKIEITGNFDSVNKIKrLPKTTNSIRTIKVSESVIEAI 265 

There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2303 

A DNA sequence (GBSx2444) was identified in S.agalactiae <SEQ ID 7091> which encodes the amino 
acid sequence <SEQ ID 7092>. Analysis of this protem sequence reveals the following: 

J-temninal signal sequence 
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Final Results 

bacterial cytoplasm -■ 

bacterial merabrane -• 

5 bacterial outside -■ 



■- Certainty=0 . 2518 (Affirmative) £ suco 
-- Certaiiity=0. 0000 (Not Clear) < suco 
•- Certaiiity=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
There is also homology to SEQ ID 4212: 

Identities = 92/144 (63%) , Positives = 118/144 (81%) , Gaps = 1/144 (0%) 

10 

!3uery: 1 MPKYSLFELENGRRRIM,SaGELQK(3NELAIiPTQFMKFLYIJVSRVlffiSKGKPEEIEK^^ 60 

+PKySIiFELENGR+R+riaSflGELQKailELaLP++++ FLYLAS Y + KG PE+ E+KQ 
Sbjct: 1198 LPKYSLFELENGRKRMLBSRGELQRGNEIMPSKyVNFIiYIiASIOT 1257 

15 Query: 61 FVNQHVSYFDDILQLINDFSKRVIIADANISaNKLYQDNKENISTOELA^ 120 

F7 (2H Y D+I++ I++FSKRVIIjAnaNri+K+ Y +++ + E A NII+LFT T 
Sbjct: 1258 FVEXJHKHYLDEIIEQISEFSKRVILADW^KVIiSAYWKHRDK-PIRKCJl^ 1316 



Query: 121 SLGAPAAFKFFDKIVDRKRYTSTQ 144 

20 +LGAPAAFK+FD +DRKRYTST+ 

Sbjct: 1317 NLGAPAAFKYFDTTIDRKRYTSTK 1340 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 2304 

A DNA sequence (GBSx2445) was identified in S.agalactiae <SEQ ID 7093> which encodes the amino 
acid sequence <SEQ ID 7094>. This protein is predicted to be 0- Analysis of this protein sequence reveals 
the Mowing: 

Possible site: 48 
30 »> Seems to have no N-terminal signal sequence 

IHTEGRMj Likelihood = -4.57 Transmembrane 239 - 355 ( 236 - 256) 

Final Results 

bacterial membrane CertaintY=0 .2826 (Affirmative) < suco 

35 bacterial outside Certainty=0 . OODD (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MARLGADFYSKlVTDI^KDGFETKFYQQTGVFLLKKDESQriESLFJUaJDKRRLESPLIGD 60 

+A+ GA +Y L+ L+KDG Y++ G + D S+L+ + A KRR ++P IGD 

Sbjct: 61 LAKGGARYYKDLIHQLEKDGESDTGYKRVGAISIHTDASKLDKMEERAYKRREDAPEIGD 120 

Query: 61 LQIIiNKSEaNTHFPEL-DGYEQLLYAS3-3ARVEGADLTRILIjEAS---GWVIKDEVHF- 115 

+ L+ SE FP L DGYE ++ SG ARV G L R LL A+ G VIK 
Sbjct: 121 ITRr,SASETKKLFPILADGYES-i/HISaBARVHGRALCRSLLSAAEKRGATVIKGNflSLL 179 

Query: 116 TITDNGFRVQGIDFDKIiVLASGBmiAKIIjDEHHYQVDVRPQKGQLRDYYFSNIMTG 171 

T+T + D +++ +GAW +1L V QK Q+ + ++ +TG 

Sbjct: 180 FENGTVTCVQTDTKQBWajAVIVTAGAWANEILKPMIHPQVSFQKAQIMHFEiy^ 239 

Query: 172 KYPVVMPBGELDIIPFtMSKTOVGASHENDMAF-DIiHIDFKVtDKFEEQRIGYFPQIiK^ 230 

+PVVMP + 1+ PnNG++ GA+HEND DL + + +A+ P L 

Sbjct: 240 SWPWMPPSDQYILSFDNQRIVAGATHENDAGLDDLRVTAGGQHEVLSKALAVaPGLAIlA 299 

Query: 231 IRLLKRVEFVPIOVIFL 247 
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Sbjct: 300 AAVETRVGFRPFTPGFL 316 

There is also homology to SEQ ID 2656. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2305 

A DNA sequence (GBSx2446) was identified in S.agdlactiae <SEQ ID 7095> which encodes the amino 
acid sequence <SEQ ID 7096>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytpplastn — Certainty=0.2572 (Affiimiative) < suco 

bacterial membrane — Certainty=0 . GOOD (Wot Clear) « suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9315> which encodes amino acid sequence <SEQ ID 9316> 

was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



QILDKIKEYI^'IIIHRHMRPDPDMlGSQIGLRDIIIUmFPKKK^^ATGFDEPTI^^ 65 
+++ I ■XDTII+HRH+REDPna GSQ GL +1+R +P+K + A G EP+L+++ + 
ELIRTISLYDTIILHRHVREDPDAyGSQCGLTEILRETYPEKNIFAVGTPEPSLSFLYSL 63 



¥ y+GALV+V DTAN RIDD+RY G L+KIDHHPN++ YGDL +VDT4-ASS 





6 




4 




66 


Sbjct: 


64 




126 


Sb j ct : 


124 




183 


Sbjct: 


184 


Query: 


243 


Sbjct: 


244 



f GIVGDTGRFL+P TT KTLK A +L + 



h KIi GFIF+ + + +NaAA V + ++ Li++F T +E 4 



h +W FV++ D 
[RAMVFFVEEDD 259 

A related DNA sequence was identified in S.pyogenes <SEQ ED 7097> which encodes the amino acid 
sequence <SE0 ID 7098>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2584 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 180/256 (70%), positives = 215/256 (83%) 
Query: 4 FQQimKIKEYDTIIIHRHtffiPDPDALGSQIGLRDIIRHNFPKKmiATGFD 63 
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Sbjct: 

CJuery: 64 KMDQVTDQDYQGALWVTDTMTPRIDDBRyKKGDFLIKIDHHPNDEVYGDLSYVDTlO^ 123 

+MDQVTD+DY+ MiV++TDTAK PRIDDERY G LIKIDHHPND+VYGD YVDT+AS 
Sbjct: 65 QhmQVTOKDYKEALVIITiraUlRPRIDDEROTLGKCLIKIDHHPmDVyGDEyYVDTS^ 124 

Query: 124 SASErVTDFMiSCDI^STSAftRVLYNGIVGDTGRFLYPATTSKrLKIASKLREFDEDFS 183 

SASEI4 DPA S +L LS AA++LY GIVGDTGRFLY +TTSKTL lAS+LR F+FDP+ 
Sbjct: 125 SASEIIADPAFSQNLTLSDKAAKLLYTGIVGDTGRFLYASTrSKrLSIASQLRHFBEDFA 184 

Query: 184 AMARQMDSFPFKIAKLQGFIFEQLKIDKNGAACOTLTQEDIjKRFDVTDAETAAIVGVPGK 243 

A4-+RQMDSFP KIAKLQ ++PE h ID++Gfta V ++QE LK FDVT AE++AIV PGK 
Sbjct: 185 AISRQl^roSFPLKlAKLQSYVFEHLTIDESGAAYVLVSQETIiKHFDVTLABSSAIVCRPGK 244 

Query: 244 IDIVESWAIFVKQSDG 259 

ID V++WAIFV+ +DG 
Sbjct: 245 IDNVQAWAIFVELTDG 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or die^ostics. 

Example 2306 

A I)NA sequence (GBSx2447) was identified in S.agalactiae <SEQ ID 7099> which encodes the amino 
acid sequence <SEQ ID 7100>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N- terminal signal sequence 



Pinal Results 

bacterial cytoplasm — Certainty=0. 1846 (Affirmative) < succ? 
bacterial membrane — Certaintys=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology witli tlie following sequences m tlie GENPEPT database. 

>GP:CAB42949 GB:AL049863 putative adenosine deaminase [Streptomyces 

coelicolor A3 (2)] 

Identities = 123/343 (35%) , Positives = 175/343 (50%) , Gaps = 26/343 (7%) 



FEFIRPLLQ^KEaIKFAAYDVARQftAI™VIYIEIRFAPELS^mKGLTASDTVLAVI.EGL 124 
FE ++Q +E L AA + A + V+Y E+R+APEL+ GL+ + V V EGL 
FEHTLAVMCJNREGLLRAAEEYVLDLAADGVVyGEVRYAPEIOTRGGLSMRBVVBTVQE^^ 130 







6 


40 


Sbjct: 


11 










Sbjct: 


71 


45 








Query: 


125 






131 


50 




176 




Sbjct: 


185 






228 


55 








Sbjct: 


245 






286 


60 


Sbjct: 


305 



I +L + G +R+GH 



+A EMC TSNLC3T AA+SI P L D G ++T+NTDNR VS 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be use&l antigens for 
vaccines or diagnostics. 

Example 2307 

A DNA sequence (GBSx2448) was identified in S.agalactiae <SEQ ID 7101> which encodes the ammo 
acid sequence <SEQ ID 7102>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2042 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 

bacterial outside Certainty=0. 0000 (Hot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9639> which encodes amino acid sequence <;SEQ ID 9640> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB13290 GB:Z99111 similar to sulfite reductase [Bacillus subtilis] 
Identities = 63/146 (43%) , Positives = 87/146 (59%) , Gaps = 1/146 (0%) 

Query: 5 MaiiAKIVYASMTSNTEEIADIVaDKLRDLGLDVEVEECTMVr^^ 63 

MR +VYA+M+(aSITE +2UD++ L++ +V+ E +D A F D D 1+ TYT+ 
Sbjct: 1 MAKlLLVYATMSGNTEAMaDLIEKBLQEaiAEVDRFEamiDnaQLFro 60 

Query: 64 GDGDLPDEIVDFYEDLAEVDLSGKVYGWGSGDTFYDYFCKSVDEFEAQERLTGRQKGAD 123 

GDGDLPDE +D ED+ E+D SGK V GSGDT Y++FC +VD EA+ G 
Sbjct: 61 GDGDLPDEFLDLVEDMEEIDFSGKTCAVFGSGDTAYEFFCGAVDTLEAKIKERGGDIVLP 120 

Query: 124 CVKVDLAftEDBDIENLEAFAEBIASK 149 

VK++ E E+ E L F + A K 
Sbjct: 121 SVKIENNPEGBEEEELINFGRQFRKK 146 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7103> which encodes the amino acid 
sequence <SEQ ID 7104>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.1641(Af firmative) < suco 

bacterial iiienibrane Certainty^O.OOOO (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/147 (78%) , Positives = 136/147 (91%) 

Query: 5 ^ra^KIVYASMTGI^ITEEIADIVaDKLRDLGLDVEVEEC'IMVDAADFEDADIAIVATYTYG 64 

MALAKIVYASMTGNTEEIADIVA+KL++LG DV+++ECT VDA+tFE+ADIA+VATYTYG 
Sbjct: 1 tmiAKIVYASMTGNTEEIADIVaNKLQELGHDVDIDECrrVDASEFENADIAVVATYTYG 60 

Query: 65 DGDLPDEIVDFYEDLAEVDLSGKVYGWGSGDTFYDYFCKSVDEFEAQFALTGAQKGADC 124 

DGDLPDEIVDFYEDL ++DL GK+YGWGSGDTFYDYFCKSVD+F QFALTGA KGA+ 
Sbjct: 61 DGDLPDEIVDFYEDLQDLDLEGKIYGWGSGDTFYDYFCKSVDDFSEQFALTGAIKGAEP 120 

Query: 125 VKVDLAAEDEDIEOTjEAFAEElASKm 151 

VKVDI1AAEDEDI+ LEAFAE+++ +N 
Sbjct: 121 VKVDLAAEDED1DRLEAPAEQI,SQAVM 147 
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Based on fliis analysis, it was predicted that, these proteins and their epitopes could be usefUl antigens for 
vaccines or diagnostics. 

Example 2308 

5 A DNA sequence (GBSx2449) was identified in S.agalactiae <SEQ ID 7105> which encodes the amino 
acid sequence <SEQ ID 7106>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 3568 (Affirmative) < suco 

bacterial membrane CertaintY=0 . 0000 (Not Clear) <: suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>6P:AAB98234 6B:U67480 chorismate mutase/prephenate dehydratase 
(pheA) [Methanococcus jarniaschli] 
Identities = 26/85 (30%) , Positives = 46/85 (53%) , Gaps = 1/85 (1%) 

20 Query: 2 ELEEIRQEIDEIDQQIiVSIiUaTRMGLILEVIAFKKKHRLFVT^NREt^^ 61 

+L EIR++IDEID +++ L+ R L +V K + +P+ D RE + + + K + 
Sbjct: 4 KIAEIRKKIDEIDNKILKLIftERNSIJUajVREIKNQLGIPIiroEEREKriYDRIRiaCKE 63 

Query: 62 HQFDDVIRATFKDIMTE-SRVYQKE 85 
25 H D+ I 1+ E ++ QK+ 

Sbjct: 64 HNVDENIGIKIFQILIEHNKftLQKQ 88 

There is also homology to SEQ ID 1568. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 2309 

A DNA sequence (GBSx2450) was identified in S.agalactiae <SEQ ID 7107> which encodes tlie amino 

acid sequence <SEQ ID 7108>. This protein is predicted to be a minor structural protein. Analysis of this 

protein sequence reveals the following: 

35 possible site: 23 

>» Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=o . 1828 (Affirmative) < suco 
40 bacterial membrane -— Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:RAC34413 GB:AF158600 putative minor structural protein 
45 [Streptococcus thermophilus bacteriophage Sfill] 

Identities = 39/65 (60%) , Positives = 54/65 (83%) 

Query: 1 MBVETDSQEVLMSTGLKDLKaHAyPAITyBVEGYVDLELGDVVRIQDDGyEPPIiILTARV 60 
ME++TDS++VL+ST L++L+ YPAITYBVDG++DL++GD V+IQD G+ P L+L ARV 
50 Sbjct: 707 MEIimtSEDVLISTALRNLRKFCypAITYEVDGFIJDIJDIGDTWIQDTGPSPMLML^ 766 

Query: 61 VEQDI 65 
EQ I 

Sbjct: 767 SEQQI 771 



wo 02/34771 



-2583- 



PCT/GBOl/04789 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 

vaccines or diagnostics. 

5 Example 2310 

A DNA sequence (GBSx2451) was identified in S.agalactiae <SEQ ID 7109> which encodes the amino 

acid sequence <SEQ ID 7110>. This protein is predicted to be phosphomethylpyrimidine kuaase (thiD). 

Analysis of this protein sequence reveals the following: 

Possible site: 45 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2051 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22074 GB:U32725 phosphomethylpyrimidine kinase (thiD) 
[Haemophilus influenzae Rd] 
20 Identities = 29/78 (37%) , Positives = 48/78 (61%) , C3aps = 2/78 (2%) 

Query: 4 RNVLAISGNDIFSCSGGLHADIATyVVNKLHGFVaOTCLTJiMSDKG-EEOTPIEftSIIj^ 62 

+ VL I+G+D G G+ ADL T+ + + g' A+T +TA + G F++ PI ++ Q 
Sbjct: 5 KQVLTIAGSDSGGGaGIQaDLKTPQraJGVraiBAITAVTAQlTOI^^ 64 

25 

Query: 63 LESLK-DVEFGSIKLGLL 79 

LE++K D + S K+G+L 
Sbjct: 65 LEAVKNDFQXASCKIGML 82 

30 There is also homology to SEQ ID 4408. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2311 

A DNA sequence (GBSx2452) was identified in S.agalactiae <SEQ ID 7111> which encodes the amino 
35 acid sequence <SEQ ID 71 12>. Analysis of this protein sequence reveals the following: 
Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.43 Transmembrane 109 - 125 ( 102 - 129) 
INTEGRAL Likelihood = -1.28 Transmembrane 84 - 100 ( 84 - 100) 

40 

Final Results 

bacterial membrane — Certainty=0. 3972 (Affirmative) < suco 
bacterial outside — Certainty^O. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty-0 . 0000 (Not Clear) < suco 

45 

The proteiQ has homology with the following sequences iti the GENPEPT database. 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
[Streptomyces coelicolor A3 (2)] 
Identities = 25/93 (26%) , Positives = 43/93 (45%) , Gaps = 1/93 (1%) 

50 

Query: 62 SASVEILCRGmiPVSATKySKIVSVSISSIFFGLLHSANNHUBLIEIENLCL-EXSLFLS 120 

+A+ E++ RG L + +■!-+ ++ + FGL+H N +L + + G L+ 

Sbjct: 143 AATEEVVFRGVLFRIIEEHIGTYLAICLTGLVFGIMOjIJSEDATLWGAIAIAIEaGFMLA 202 
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Query: 121 LYVILKGNIWGACGIHGAWNCVQGSVFGIEVSG 15.3 

N+W G+H WN G VF VSG 
•Sbjct: 203 AAYAATRNLWLTIGVHFGWNETiAGGVFSTWSQ 235 

5 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 

Example 2312 

10 A DNA sequence (GBSx2453) was identified in S.agcdactiae <SEQ ID 7113> which encodes the amino 
acid sequence <SEQ ID 7114>. This protein is predicted to be pppL protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty=0 . 5796 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside CertaintyteO. 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10712 GB:aJ132604 pppL protein [Lactococcus lactis] 
Identities = 38/64 (59%) , Positives = 51/64 (79%) 

25 Query: 1 MEISIiLTDlGQRRSNNQDFINQPENK2«OTLIILar)GMGGHRaasriASEMTVTDLaSI^ 60 

ME S+L+DIG +RS NQD++ + N+2«3 L +IiaDGMGGH+aGN+AS++TV DLG V?+ 
Sbjct: 1 MEySILSDIGSBSlSTNQDYVGTXVNRJ^QLFLIJUXaiGGHKRGNmSKLTVEDIBKLW 60 

Query: 61 ETDF 64 
30 ET F 

Sbjct: 61 ETFF 64 

There is also homology to SEQ ID 3022: 

Identities = 58/74 (78%) , Positives = 69/74 (92%) 

35 

Query: 1 MEISLLTDIGQRRSNNQDFINQFEinCASVPLIILADGMGGEnWGNIASEMTVTDLGOT 60 

M+ISL TDIGQ+RSNNQDFIN+F+NK G+ L+ILADGMGGHRaGNIASEMTVTDLG +W 
Sbjct: 1 MKISLKTDlGQKRSNNQDFINKFDNKRGITLVILaIX3MGGHRaGNIASE^^OTDIBRE^ 60 

40 Query: 61 ETDFSELSEIRDMM 74 

+TDF+ELS+IRDW+ 
Sbjct: 61 KTDFTELSQIRDWL 74 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 

45 vaccines or diagnostics. 

Example 2313 

A DNA sequence (GBSx2454) was identified in S.agalactiae <SEQ ID 7115> which encodes the amino 

acid sequence <SEQ ID 7116>. This protein is predicted to be sunL protein. Analysis of this protein 

sequence reveals the following: 

50 Possible site: 25 

»> Seems to have no N-tertidnal signal sequence 

Final Results 
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bacterial cytoplasm --- Certainty=0 . lS31(Af f irmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

! lactis] 



Query: 1 MSILSSVCQTLRKGGIITYSTCTIFKEEMFQVIEKFLENHPNFEQVELSHTQEDIVKRGC 60 
10 + IL+S ++L+K GI+ YSTCTIP+EENF V+ +FLENHPNFEQVE+S+ + +++K GC 

Sbjct: 342 LEIU^SASKSLKKSGI)WYSTCTIFDEE5NFDVVHEFLBNHPNFEQVEISNEKPEVIKEGC 401 

Query: 61 ISISPEQYHTDGFFIGQVKRI 81 
+ I+PE YHTD6PPI + K+1 
15 Sbjct: 402 IiFITPEMYHTDSFFIAKFKKI 422 

There is also homology to SEQ ID 3018: 

Identities = 64/82 (78%), Positives = 74/82 (90%) 

20 Query: 1 MStLSSVCOTIJlRMIITYSTCTIFEEENFQVIEKFLEraiPNFEQVELSHTQEDIVKRJ^ 60 

+ ILSSVCQTLRKBGIITOSTCTIF+EEN QVIE FL++HENFEQV+L+HTQ DIVK G 
Sbjct: 359 LEILSSVCQTLRKBGIITYSTCTIFDEENRQVIEAFIKJSHPNFEQVKUIHTQaDIVKDGY 418 

Query: 61 ISISPEQYHTDGFFIGQVKRIL 82 
25 + I+PEQY TDGFFIGQV+R+L 

Sbjct: 419 LIITPEQYQTDGFFIGQVRRVL 440 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 2314 

A DNA sequence (GBSx2455) was identified in S.agalactiae <SEQ ID 7117> which encodes the amino 
acid sequence <SEQ ID 7ri8>. This protein is predicted to be PTS permease for mannose snbimit IIPMan. 

Analysis of this protein sequence reveals the following: 

D N- terminal signal sequence 



INTEGRAL Likelihood 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = -0. 



Transmembrane 3 2 - 

Transmembrane 127 - 

Transmembrane 5 S - 

Transmembrane 87 - 



53 



Final Results 

bacterial meinbrane — Certaiiity=0 .4673 (Affirmative) < succ 

bacterial outside --- Certainty^O . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ? 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 25 KVPETKSIIRLTALAPLVCSILWELVSMRELISSISFIGILVGSGPVNSFVHHIPQNLM 84 

++P T + LA +L L+++ +F+ I G+ + + +PQ L+ 

Sbjct: 126 RMPRTPIIARLHACNYIA LLftLOJFYFLCAFLPIYFGftEHRKTIIDVLPQRLI 178 

55 Query: 85 NGLSAAGGLLPAVGFAMLMKLLVmmAVFYLLGFVLTAYLKLPAVAVAAI^V^ 144 

+6L AGG++PA+GFA+IJ+K++ N +++LGFV A+IiKLP +A+A + +1 
Sbjct; 179 IX3LGVi«miMPAIGFAVIiKIMMKNVYIPYFimFVAAAraiKLPVIAIACPMiM4ftLIDL 238 
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Sbjct: 239 LR 240 

There is also homology to SEQ ID 1 636: 

Identities = 104/109 ( 95%) , Positives = 108/109 (98li> 

Query: 56 LISSISFIGILVGSGPWSEVHHlPQNLMNGLSflACMIJjPAVGEMIMiaJLWTNKIlAVFY 115 

+H-SISFIGILVGSGPVN+FV HIPQNLMNGLSMGGLLPAVGFAMmLLWTNKUWFY 
Sbjct: 149 IIASISFIGILVGSGPVMAFWHIPQNI:,MNGLSAAGGLLPAVGFA^ffil!mLWTNKIlAVFy 208 

Query: 116 LLGFVLTAYLKLPAVAVAALGAVICVISSQRDIELDAITRGMSKQTTF 164 

LLGFVLTAYLKLPAVAVAALGAVICVISSQRD+ELDAITRGAISKQTTF 
Sbjct: 209 LLGPVLTAYLKLPAAffi.VAALGAVICVISSQRDLEIiDAITRGAISKQTTF 257 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2315 

A DNA sequence (GBSx2456) was identified in S.agcdactiae <SEQ ID 7119> which encodes the amino 
acid sequence <SEQ ED 7120>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.12 Transmembrane 121 - 137 ( 118 - 144) 
INTEGRAL Likelihood = -5.52 Transmembrane 91 - 107 { 89 - 111) 
INTEGRAL Likelihood = -5.20 Transmembrane 1S6 - 182 ( 162 - 192) 

Final Results 

bacterial membrane Certaintyi=0. 4248 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suoo 

bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CRB15963 GB:Z99124 phosphotransferase system (PTS) 

beta-glucoside-specif io enzyme IIABC component [Bacillus subtilis] 
Identities = 76/201 (37%), Positives = 122/201 (59%), Gaps = 3/201 (1%) 

Query: 1 MIKArjJ\LLLVFiaLTPSSQTYlLLNn:.PAIX3VFYPLPILIAITftAQKLKaNPILaLGTVV 60 

MIK Iri-AL'+ F + SQ +++L DG FYFLP+L+-A++AA+K +NP +A 
Sbjct: 121 MIKGLVALAVTFGWMAEKSQVHVILTAVGDGAFyFLPLLLAMSAflRKFGSNPYVAAAIAA 180 

(2uery: 61 MLLHPNWANLVASQKPVSLFHTIPFTLTtTCASSVIPIILIICVQAYIEKXLKQI 120 

+LHP+ L+ +GKP+S F +P T Y+S+VIPI+L I + +Y+EK++ + SL+ 
Sbjct: 181 AILHPDLTALLGAGKPIS-FIGLPVTAATYSSTVIPILLSIWIASYVEKWIDRFTHASLK 239 

Query: 121 LVLVPMLIFLSMGILSFSILGPMGTIAGQYLAVIFTFLSKyASW-APAFLVGAEAPILIM 179 

L4-+VP L + L+ +GP+G I G+YL+ +L +A A FL G F+ ++IM 
Sbjct; 240 LIWPTFTLLIWPLTLITVGPLGAILGEYLSSGVNYLFDHRGLVAMIFLAGTFS-LIIM 298 

Query: 180 PGVHSGIAALGITQLAKIGVD 200 

G+H +1 +A+ G D 

Sbjct: 299 TGMHYAFVPIHIHNIAQNGHD 319 

There is also homology to SEQ ID 2884. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 2316 

A DNA sequence (GBS3i2457) was identified in S.agalactiae <SEQ ID 7121> which encodes the amino 
acid sequence <SEQ ID 7122>i This protein , is predicted to be glucose kinase. Analysis of this protein 
sequence reveals the foUowmg: 

5 Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1180 (Affirmative) < suco 

10 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CBB14416 GB:Z99116 glucose kinase [Bacillus subtilis] 
15 Identities = 32/57 <56%) , Positives = 41/57 (71%) 

Query: 1 WIGGGVSAAGEFLRSRVEKYFVTFAFPQVKKSTKIKIAELGNDAGIIGAASIANQQ 57 

+V+GC3GVS AGE LRS+VEK F AFP+ ++ I lA LGNDAG+IG A +A + 
Sbjct: 258 IVLGGGVSRAGELLRSKVEKTFRKCAFPRAAQAADISIAALGNDAGVIGGAWIAKNE 314 

20 

There is also homology to SEQ ID 198. An alignment of the GAS and GBS proteins is shown below: 

Identities = 50/56 (89%) , Positives = 53/56 (94%) 

Query: 1 MVIGGGVSftAGEFLRSRVEKXFVTFAFPQVKKSTKIKIAELCOTlRGIlGaASLiaiQ 56 
25 +VIGGGVSaaGEFLRSR+EiCiWrF FPQV+ STKIKIAELGNDAGIIGAASLA Q 

Sbjct: 264 WIGGGVSAAGEFIiRSRIEKYFVTFTFPQWYSTKIKIAELGNDAGIlGAASLARQ 319 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 2317 

A DNA sequence (GBSx2458) was identified in S.agalactiae <SEQ ID 7123> which encodes the amino 
acid sequence <SEQ ID 7124>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have a cleavable N-term signal seq. 

35 

Final Results 

bacterial outside Certaintyi=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . OOOO (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14385 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities - 37/86 (43%) , Positives = 51/86 (59%) 

45 Query: 3 MSVILIIVILLAFVAWASWNYWVRRAAKI'IiraffiSFQKEMSRGQLlDIREafiAFHRKHIL 62 

MS +++++I AF4- + +Y +R K L E F+ + QLID+RE F HIL 
Sbjct: 1 MSNMIVLIIPPAFIIYMiaSYVyQQRIMKTIiTEEEERaGYHKaQLlDVREPNEPBGGHIL 60 

Query: 63 GARNIPASQFKVALSALRKDKPVLLY 88 
50 GARNIP SQ K + +R DKPV LY 

Sbjct: 61 GARNIPLSQI.KORKNEIRTDKPVyiiY 85 

There is also homology to SEQ ID 202. An aUgnment of the GAS and GBS proteins is shown below: 

Identities = 51/108 (47%) , Positives = 70/108 (64%) 

55 
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Query: 1 MDMSVILIIVILLAFVAWASWNYWRVRRaAKFLDNESFQKEMSRGQLIDIREAGAEHRKH SO 

M +++ ++L+ V + +WNY+ R+ AK +DNE+F+ M +GQL1D+RE AF KH 
Sbjct: a MSPITLILWLLLVGIVGYYTWNYFSFRKMAKQVDNETFKDVMRC3GQLIDLREPAAFRTKH 60 

Query: 61 ILGARNIPASQFKVALSALRKDKPVLriYDASRGQSIPRIVLLLRKERF 108 

ILGARN PA QF A+ LRKDKPVIi-)-y+ R Q V L+K F 

Sbjct: SI ILGARNFPAQQFDAAIKGLRKDKPVLiyENMRPQYRVPAVKKLKJCAGP 108 

Based on this analysis, it was predicted that these proteins and their epitopes coxdd be useful antigens for 
vaccines or diagnostics. 

Example 2318 

A DNA sequence (GBSx2459) was identified in S.agalactiae <SEQ ID 7125> which encodes the amino 
acid sequence <SEQ ID 7126>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

:> N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm — CertaintY=0 . 1B92 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2319 

A DNA sequence (GBSx2460) was identified in S.agalactiae <SEQ ID 7127> which encodes the amino 
acid sequence <SEQ ID 7128>. Analysis of this protein sequence reveals the following: 

3 N-temdnal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 3522 (Affirmative) < suco 

bacterial menibrane -— Certaintyi=0 . 0000 (Mot Clear) < suco 

bacterial outside — Certalntys=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No conesponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2320 

A DNA sequence (GBSx2461) was identified in S.agalactiae <SEQ ID 7129> which encodes tiie amino 
acid sequence <SEQ ID 7130>. Analysis of this protein sequence reveals the following: 

0 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2770 (Affirmative) < suco 

bacterial membrane Certainty=0.0000<Not Clear) < succ> 
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bacterial outside Certainty=0. 0000 {Not. Clear) < auco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB18708 GB:XJ38906 ORF33 [Bacteriophage rlt] 
Identities = 56/85 (65%) , Positives = 66/85 (76%) , Gaps = 1/85 (1%) 

Query: 1 MnJE&TTDDVIU^QLSVDEiraiAERLLETVSDTLRIiESiSKVGiajL 59 

M FAT DD+ H-LWR L DE +R2iE LLE VSD-hLR EIA KVG++L MI E P YFA+ 
Sbjct: 1 MNPFATVDDLTMIMlPBKGDEKERAEKLLEIVSDSLREEMKVGmLYflMIJffiKPSYF;^ 60 

Query: SO VLKSVTVDIVaRTLMTATQGEPMSQ 84 

V+KSVTVDIVARTtMT+T EPM+Q 
Sbjct; SI WKSVTVDrVARTLMTSTDQEPMTQ 85 

There is also homology to SEQ ID 1432. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2321 

A DNA sequence (GBSx2462) was identified m S.agalactiae <SEQ ID 7131> which encodes the amino 
acid sequence <SEQ ID 7132>. This protein is predicted to be regulatory protein TypA (typA). Analysis of 
this protein sequence reveals the following: 

5 N-terminal signal sequence 



25 Pinal Results 

bacterial cytoplasm Certainty=0 .2238 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB0635l GB:AP001516 GTP-binding protein TypA/BipA (tyrosine 
phosphorylated protein A) [Bacillus halodurans] 
Identities = 175/237 (73%) , Positives = 204/237 (85%) , Gaps = 1/237 (0%) 

35 Query: 1 MEDIFVGETVTPTDAIEPIjPVLRIDEPTLQMTPLVNNSPFAGREGKWITSRKVEERLLAE 60 

ME+I VGETV P D +PLP+LRIDEPTLQMTF1:jVNNSPFAGREGK +TSRK+EERL AE 
Sbjct: 281 MEEINVBETVCPVDHQDPLPILRIDEPTIiQMTFDVNNSPFAGREGKIOTSRKLEERLRAE 340 

Query: 61 LQTDVSLRVDPTDSPDKWTVSGRGELHLSILIETMRREGYELQVSRPEVIIKEIDGVQCE 120 
40 L+TDVSLRV+ TDSPD W VSGRGELHLSILIE MRREGYEIiQVSH-PEVII+EIDGVQCE 

Sbjct: 341 LETDVSLRVENTDSPDMMVVSGRGELHLSIIiIEaSIMRREGYEL(3VSKPEVIIREIDGVQCE 400 

Query: 121 PFERVQIDTPEEYQGAIIQSLSERKGDMUJMQMVGNBQTRLIFIiIPARGLIGYSTEPIjSM 180 
P ERVQID PKEY GA+++Sri BRKG+ML+M G+GQ RL F++PARGLIGY+TEFLS 
45 Sbjct: 401 PVERVQIDVPEEYTGAVMESI.GERKGEMIl^mT^m3SGQVRLEFM\rt'ARGLIGYT^E 460 

Query: 181 TRGYGIMNHTFDQYLPWQGEIGGRHRGALVSIENGKATTYSIMRIEERGNLSFVNP 237 

TRGYGI+MH+FD Y PV 6-(-+GGR +G LVS+E GKRT Y I+++E+RG + FV P 
Sbjct: 461 TROYGIINHSFDSYQPVTPGQVGGRRQGVLVBMETGKATQYGIIQVEDRGTI-FVEP 516 

50 

There is also homology to SEQ ID 206. An alignment of the GAS and GBS proteins is shown below: 

Identities = 228/237 (96%) , Positives = 233/237 (98%) , Gaps = 1/237 (0%) 

Query: 1 MEDIFVGETVTPTDAIEPLPVLRIDEPTLQMTFLTONSPFAGREGKWITSRKVEERLLAE 60 
55 MEDIFVGET+TPTD +E LP+LRIDEPTLQMTFLVMNSPFAGRKGKWITSRKVEERLLAE 

Sbjct: 284 MEDIFVGETITPTDCVEALPinRIDEPTLQMTFLVimSPFAGREGKMITSRra/EERIiljAE 343 

Query: 61 LQTDVStJiOTPTDSPDKmWSGRGELHLSILIETMRREGYELQVSRPEVIIKElDGVQCE 120 
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LQTDVSLIOT)PTDSPDKjm?SGRGELHLSIIiIBfIMRREGYEI<}VSRPEVIIKEIDGV+CE 
Sbjct: 344 LQTDVSLRVDPTDSPDKWOTSGRGEIBLSILIETMRREGyELQVSRPEVIIKEIDSVKCB 403 

Query': 121 PFERVQIDTPEBYQGMIQSLSERKGDMIiDMQMVGNGQTRLIPLIPARGLIGYSTEFLSM 180 
5 PFERVQIDTBEEYQGiMIQSLSEREGDMLDMQMVCaiGQTRLIFLIPARGLIGYSTEFLSM 

Sbjct: 404 PFERVQIDTPEEYQGAIIQSLSERKGDMLDMQMVGNGQTRLIPLIPAEGLIGYSTEFLSM 463 

Query: IBl TRGYGIMSHTFDQYLPWQGElGGSHRQAIiVSIENGKRTrYSimiEERGHljSFV^ 237 
TRGYGIMNHTEDQYLPWQGEIGGRHRGaLVSIENGKATTYSItEIEERG + PVNP 
10 Sbjct: 464 TRGyGIMNHTFDQYLPWQGEIGGRHRGAIiVSIENGKATTYSIMRlEERGTI-FVNP 519 

Based on this analysis, it was predicted that these proteins and their epitopes covild be useful antigens for 
vaccines or diagnostics. 

Example 2322 

15 A DNA sequence (GBSx2464) was identified in S.agalactiae <SEQ ID 7133> which encodes the amino 
acid sequence <SEQ ID 7134>. This protem is predicted to be psendouridine synthase family 1 protein 
(rluB). Analysis of this protein sequence reveals the foUowmg: 
Possible site: 34 

»> Seems to have no N- terminal signal sequence 

20 

Final Results 

bacterial cytoplasm — Certainty=0 . 1950 (Affirmative) < suco 
bacterial membrane — Certalnty=0 . 0000 (Not Clear) < euco 
bacterial outside --- CertaintyiO. 0000 (Not Clear) < suco 

25 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14248 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 59/105 (56%) , Positives = 85/105 (80%) 

30 Query: 5 VKERIYPVGRLDMDTI^aLILTmSDPTDKMIHPRNEIDKmARVKGIATKE^ 64 

+ +RIYP+GRtD+DT+GLL+L11ilDG+F +K++HP+ EIDK Y+A+VKGI KB LR L R 
Sbjct: 91 IPQRIYPIGRMYDTSGLLLLmiGEFJmMIPKYEIDKTYVaKVRGIPPKBLIiRKLER 150 

35 ^ G+ ++ KT PAh- ++ +D +K S+++LTIHBGRN QV++MFE 

Sbjct: 151 GIRbBEGKTAPAKAKIJIjSUJKKKQTSIIQLTIHEGRlIRQfVRRMFE 195 

There is also homology to SEQ ID 4728: 

Identities = 96/109 (88%) , Positives = 106/109 (97%) 

40 

Query: 1 • MLPQVKERIYPVGRLIOTTTGLLILTHDGDFTDKMIHPRiraiDKVYLaRVRGIATK^^ 60 

+LPQVKER1YPVGRLDWDT+G+LILTNDQDFTD MIHPHjraiDKVYIJffiVKGIATKENLR 
Sbjct: 94 LLPQVKERIYPVGRI£IWDTSGVLILTiro6DFTDTMIHPRNEIDKVYIARVKGtATK^ 153 

45 Query: 61 PLTRGWIDGKKTKPARYTIIKVDHEKNRSWELTIHEGRNHQVKKMFE 109 

PLTRG+VIDGKKTKPARY I++V+ +K+RS+VELTIHEGRNHQVKKMFE 
Sbjct: 154 PLTRGIVIDGKKTKPARYNIVRVEADKSHSIVELTIHEGRNHQVKKMFE 202 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 2323 

A DNA sequence (GBSx2466) was identified in S.agalactiae <SEQ ID 7135> which encodes the amino 
acid sequence <SEQ ID 7136>. This protein is predicted to be L-ribulose 5-phosphate 4-epimerase. 
Analysis of this protein sequence reveals the following: 
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N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 827 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Hot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:2aD45716 SB:AF160811 L-ribulose S-phosphate 4-epimerase 
[Bacillus stearothermophilus] 
Identities = 68/103 (66%) , Positives = 82/103 (79%) 

Query: 2 QEMIERVCEOTKSLPVHSLVKFTWGNVSEVDREAGLIVIKPSGVDYDQLTPENMVVTDLE 61 

+E-(-++ V EBN LP + LV FTWGNVS +DRE GL+VIKPSGV YD+LT ++MW DI, 
Sbjct: 3 EELRQRVLEaNLQLPQYRLVTFTWGNVSGIDRERGLWIKPSGVATOKLTIDDMVWDLT 62 

Query: 62 GHIVEGDENPSSDLPTHVQLYKAWPEW3GIVHTHSTEA.VGWaQ 104 

GN+VEGDL PSSD PTH+ LYK +P H-GGIVHTHST A WAQ 
Sbjct: 63 GNWEGDLKPSSDTPTHLMLYKQFPGIGGIVHTHSTWATVWaQ 105 

There is also homology to SEQ ID 4600: 

Identities = 93/103 (90%) , Positives <= 96/103 (92%) 

Query: 2 QEmEKVCEaWKSLPVHSLVKFTWGinreEVDREAGLIVIKPSGTOYDQLTPEa^^ 61 

QEMRERVC ANKSLP H LVKFTWGNVSEV RE G IVIKPSGVDVD LTPENMWTDL+ 
Sbjct: 6 QEMRERVCaAireSLPQHGIiVKFTWGOTSEVCREI£!RIVIKl>SGVDyDLLTPENMVVTDLD 65 

Query: 62 GNIVEGDENPSSDLPTHVQi:.yKaMPEVGGIVHTHSTEAVGWAQ 104 

GH+VEGDIOTSSDLPTHV+LYKAWPEVGGIVHTHSTEAVGWRQ 
Sbjct: 66 GNWEGDIUPSSDLPTHVELYKAWPEVGGIVHTHSTEAVGMAQ 108 

Based on this analysis, it was predicted tihat these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 

Example 2324 

A DNA sequence (GBSx2467) was identified in S. agalactia e <SEQ ID 7137> which encodes the amii 
acid sequence <SEQ ID 7138>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3452 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology vsdth the following sequences in the GENPEPT database. 

>GP:AAG05712 GB:AE004658 hypothetical protein [Pseudomonas aeruginosa] 
Identities = 141/200 (70%), Positives = 162/200 (80%), Gaps = 1/200 (0%) 

Query: 10 LSLGTDYETLANRFRPlFREISAGNVEREKARALPYEPIEWIiKKAGFGAVRVPSEYGGAG 69 

LS G DYE LA RFRPIF 1+ G VERE+ R LP+E I WLK+AGFGAVRVP E+GGAG 
Sbjct: 14 LSEGADYELLAQRFRPIFARIAEGAVERERQRELPHEAIAWLKQAGFGAVRVPREHGGAG 73 

Query: 70 ASIGQLFQLLIELAEADSNIPQALRAHFAFVEDRLllAPPGVDRDTWFARFVAGDLVGNGW 129 

AS+ QL QLLIELAEADSNI QALR HFAFVEDRLNA PG RD K RF!/ GDLVG W 
Sbjct: 74 ASLPQLVQLLIELAEADSNITQALRGHFAFVEDRLNAEPGPGRDRWLRRFVEGDLVGCAW 133 
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Sbjct: 134 TEVGSVRLGEVLTRVSRKDDGRWVVNGSICfYSTGSLFSDWIDriYAQRDDTGADVlflAIRT 193 

Query- 189 RHAGVRHSDDWDGFGQRTTG 208 • • . _ 

GVR SDDWDGFGQRTTG 
Sbjct: 194 DQPGVRQSDDWDGFGC2RTTG 213 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its qpitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 232S 

A DNA sequence (GBSx2468) was identified in S.agalactiae <SEQ ID 7139> which encodes the amino 
acid sequence <SEQ ID 7140>. Analysis of this protein sequence reveals the following: 

3 N-terminal eignal 



Pinal Results 

bacterial cytoplasm — certainty=0 . 1919 (Affirmative) < suoo 

bacterial membraae — Certainty=0 . 0000 (Not Clear) < sueo 

bacterial outaide --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2326 

A DNA sequence (GBSx2474) was identified in S.agalactiae <SEQ ID 7141> which encodes the amino 
acid sequence <SEQ ID 7142>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2978 (Affirmative) < suco 

bacterial membrane — - Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=o . oooo (Not clear) < suco 

The protein has no significant homology with any sequences ia the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 



Example 2327 

A DNA sequence (GBSx2476) was identified in S.agalactiae <SEQ ID 7143> which encodes the amino 
acid sequence <SEQ ID 7144>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm — Certainty=0. 5402 (Affirmative) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2328 

A DNA sequence (GBSx2477) was identified in S.agalactiae <SEQ ID 7145> which encodes the amino 
acid sequence <SEQ ID 7146>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — CertaintjraO. 2755 (Affirmative) « succ? 

bacterial raeiribrane — Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suoo 

The protein has homology with the following sequences in the GENPEPT database. 





1 


Sbjot: 


262 




61 


Sbjct: 


322 


Query. 


121 


Sbjct: 


382 


Query: 


181 


Sbjct: 


442 




241 


Sbjct: 


502 



IK+V+V VNG + +IE+DQLLVaTGR ENT +IJIL AAGVE G EI+I+D+ +T+N + 



IYAAGDVTLGPQFVYVaAY+GG+ NAIGGmKK++I. WP VTFT P +ATVGr,TE+Q 



AKE GY+VKTSVLPL AVPRA+VNRETTGVFKLV2iD++T+KVLG H+V+ENaaDVlYAA+ 



There is also homology to SEQ ID 1820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefid antigens for 
vaccines or diagnostics. 

Example 2329 

A DNA sequence (GBSx2478) was identified in S.agalactiae <SEQ ID 7147> which encodes the amino 
acid sequence <SEQ ID 7148>. Analysis of this protein sequence reveals the foUoiwmg: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0 . 3642 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2330 

10 A DNA sequence (GBSx2479) was identified in S.agalactiae <SEQ ID 7149> which encodes the amino 
acid sequence <SEQ ID 7150>. This protein is predicted to be siirface protein Rib. Analysis of this protein 
sequence reveals the foUowiag: 

Possible site: 61 

»> Seems to have no N- terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty-0. 1936 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

20 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2331 

25 A DNA sequence (GBSx2480) was identified in S.agalactiae <SEQ ID 715 1> which encodes the amino 
acid sequence <SEQ ID 7152>. This protein is predicted to be Nra. Analysis of this protem sequence 

reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty^O . 1510 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

A related GBS nucleic acid sequence <SEQ ID 9383> which encodes amino acid sequence <SEQ ID 9384> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7153> which encodes the amino acid 

40 sequence <SEQ ID 7154>. Analysis of this protein sequence reveals the followmg: 

Possible site: 16 
»> Seems to have no H-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 22 - 38 ( 22 - 38) 

45 Pinal Results 

bacterial menibrane — Certainty=0. 1256 (Affirmative). < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certaintyao. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 42/157 (26%) , Positives - 78/157 (48%) , Gaps = 2/157 (1%) 

Query: 71 LLGREFIDSQHFiaDINAYFLRHFlCYCnfYFIPDFXFIJSITSRLSY--SKDLYHI.LDKGIiaD 128 

LLG ++S FK I F R FI +PD + + R +K Y+ L + + 

Sbjct: 8 LLGjNNIUJSLPFKRinVSFSRIiFISNLQVLLPDIHLFHYLRRQQKRNKSFYNTriKTIVEE 67 

Query: 129 IFNLKGtSNLTFSKHETVLLTMQLSNLiaTFLAPLSVYVISSSNIRLQTYQVMLNQYFTSK 188 

+ +G + 4-L T+QL L++T+L P+ \'Y+++++ L Ii+ YF 

Sbjct: 68 WMSAEGIVGKLPSYHLLLFTIQLEELLKTYLPPIPWLLTNNTAALDIiMTNALSIYFPPA 127 

Query: 189 lAEPFFVNYQTTQIDEKLLKKADIIIAERRYlSSLKN 225 

lA VN + + + +K +IIA+R+y++ +++ 

Sbjct: 128 lATUMPVNVEIIPFKDIVKEKQSVIIADRQYLNLIQH 164 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2332 

A DNA sequence (GBSx2481) was identified in S.agalactiae <SEQ ID 7155> which encodes the amino 
acid sequence <SEQ ID 7156>. Analysis of this protein sequence reveals the following: 

5 H- terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certaintyi=0. 1383 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certaiiity=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 



Example 2333 

A DNA sequence (GBSx2482) was identified in S.agalactiae <SEQ ID 7157> which encodes the amino 
acid sequence <SEQ ID 7158>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4145 (Affirmative) < suoo 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 2334 

A DNA sequence (GBSx2484) was identified in S.agalactiae <SEQ ID 7159> which encodes the amino 
acid sequence <SEQ ID 71 60>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGR2U, Likelihood = -2.02 Transmembrane 34 - 50 ( 34 - 50) 

Final Results 

bacterial tneitibrane Certainty=0.1808 (Affirmative) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

bacterial cytoplasm CertaintyaO. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2335 

A DNA sequence (GBSx2485) was identified in S.agalactiae <SEQ ID 7161> which encodes the amino 
acid sequence <SEQ ID 7162>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

The protein has homologj' with the following sequences in the GBNPEPT database. 

>GP:CAB52002 GB-.JSilQ9663 hypothetical protein tStreptomyces 
coelicolor A3 (2) ] 

Identities = 61/141 (43%) , Positives = 86/141 (60%) , Gaps = 2/141 (1%) 

Query: 3 TYFmFLKTN(3AYADIJJGTAHLPIKPKTKVAIVTCmSRLHVAQALGIiaLGDAHILRNAG 62 

T D ++ N+ YA + +P +VA+V CMD+RL + ALGIi LGD H +RMAG 

Sbjct: 5 TVTDRLVEAHERYAAAFADPGMnARPVURVaVVACMDARt.D]:iHAA^ 64 

Query: 63 GRVTDDVLRSLVISQQQLGTRBIVVLHHTDCGAQTmEAFAaQLQRDLSVDMHGHDPLP 122 

G VTDDV+RSL ISQ+ LGTR + ++HHT CG +T T E F L+ ++G 
Sbjct: 65 GVVTDDVIRSLTISQRfiLGTRSVALIHHTGCG»!ETITEE-FRHDLELEVG-QRPAMAVE& 122 

Query: 123 FNDIEESVREDVAKIjHASPFL 143 

F D ++ VR+ + ++ SPFL 
Sbjct: 123 FRDADQDVRQSIERVRTSPFL 143 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6469> which encodes the amino acid 
sequence <SEQ ID 6470>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Pinal Results 

bacterial cytoplasm — Certainty=0. 2295 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 109/145 (74%) , Positives = 128/146 (87%) 

Query: 1 MTTYFDNPLKTNQAYMLHGTJUILPIKPKTKVAIVTCmSRIJimQALGIJUCDM 60 

+ +YF++P+ NQAY LHGTAHLP+KPKTIOaiVTCMDSELHVAOMUSMJDGnAHIIJaj 
Sbjct: 1 r^MSYFEHF^mANQRYWU:lHGTAHLPLKPKTKVAIVTC»SRI^HVAQAI^^ 60 

Query: 61 AGGRVTDDVLRSLVISQQQLGTREIVVLHHTDCQRQTFENBAPAAQLQRDLGVDMHGHDP 120 

aGGRVT+D+H-RSLVISQQQ+GTREIWLHHTDCGAQTFTNE FA + LGVD+ G DF 
Sbjct: 61 AGGRVTEDMIRSLVISQQQiraTREIVVXJJHTDCGAQTETOEGFAKHIHEHLGTO 120 

Query: 121 LPENDIBESVRBDVTSJKLHASPFDREE 146 

LPF D+E+SVRED+AK+ AS + ++ 
Sbjct: 121 LPFQDVEDSVREDMAKIRASSLISDD 146 

Based on tiiis analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2336 

A DNA sequence (GBSx2486) was identified ia S.agalactiae <SEQ ID 7163> which encodes the amino 
acid sequence <SEQ ID 7164>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results- 

bacterial cytoplasm — Certaintyi=0. 0932 (Affirmative) < succs. 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:A2U308811 GB:2^004955 phosphoribosylaminoimidazole carboxylase, 
catalytic subunit [Pseudomonas aeruginosa] 
Identities = 20/27 (74%) , Positives = 26/27 (96%) 

Query: 1 MFKHaEEARGRGIKIIIAGAGGAAHLP 27 

+F++AEEA GRG+^-t-IXAGAGGAAHIiP 
Sbjct: 46 LFOYAEEaEGRGLEVIIAGAGGAAHLP 72 

There is also homology to SEQ ID 910: 

Identities = 27/27 (100%) , Positives = 27/27 (100%) 

Query: 1 MPKHAEEARGRGIKIIIAGAGGAAHLP 27 

MFKHAEEARGRGIKIIIAGAGGRAHLP 
Sbjct: 87 MFKHAEEARGRGIKIIIAGAGGAAHLP 113 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2337 

A DNA sequence (GBSx2488) was identified in S.agalactiae <SEQ ID 7165> which encodes the maao 
acid sequence <SEQ ED 7166>. Analysis of this proteui sequence reveals the following: 

Possible site: 43 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.85 Transmembrane 58 - 74 ( 53 - 80) 
INTEGRAL Likelihood = -5.79 Transmembrane 103 - 119 ( 101 - 122) 

Final Results 

bacterial membrane - 
bacterial outside - 
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bacterial cytoplasm Ceirtainty=0 . 0000 (Not Clear) < suco ^ 

There is also homology to SEQ IDs 880 and 9278. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccmes or diagnostics. 

Example 2338 

A DNA sequence (GBSx2489) was identified in S.agalactiae <SEQ ID 7167> which encodes the amino 
acid sequence <SEQ ID 7168>. This protein is predicted to be short chain alcohol dehydrogenase. Analysis 
of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1742 (Affirmative) < suco 

bacterial mettibrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- CertaintY=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9357> which encodes amino acid sequence <SEQ ID 9358> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:aaD06605 GB:AE00153Q putative oxidoreductase [Helicobacter 
pylori J99] 

Identities = 68/94 (72%) , Positives - 79/94 [83%) 

Query: 4 IDLLVmAGLMX3LDKSYEADFGD5#rrMINTNVVGLIYLTRCILPKMVEVNRGLIINLGS 63 

ID L+NNAGLALGL+K+YB + DW MI+TN+ GL++LTR ILP M+E ++G IINLGS 
Sbjct: 76 IDAIiINNAGIJa.GIMCAYECELDDWEVMIDTNIKGI.LHLTRLILESMIEHDQGTIINLGS 135 

Query: 64 XAGTIPYPGAlWYeASKAFVKQFSIiNLRflDLAGT 97 

AGT YPG imGASKAPVKQPSIJSlIJiaDLaGT 
Sbjct: 136 lAGTYAYPGGNVYGASKAFVKQFSUILRaDLaGT 169 

A related DNA sequence was identified in S.pyogenes <SEQ ID 7169> which encodes the amino acid 

sequence <SEQ ID 7170>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ=. 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9121> which encodes the amino acid sequence 
<SEQ ID 9I22>. Analysis of this protein sequence reveals the following: 

Possible site: 12 
»> Seems to have an uncleavable N-term signal seq 

Pinal Results 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . OOOO (Hot Clear) < suco 

bacterial cytoplasm Certainty5=0. 0000 (Not Clear) < snco 



An alignment of the GAS and GBS proteins is shown below. 



